Method R Discussion Group

Method R helps developers, DBAs, and decision-makers optimize Oracle-based software in every phase of the software life cycle.

Discuss Method R products, techniques, and events.

Ask new questions or post your own responses.

Need some help thinking outside the box

I have a query that runs hourly in a large application.
When the application was using Oralce 9i, it ran it a couple of minutes.  Now that we have upgraded to Oracle 10.2.0.3 the query runs in about 7 hours or so in production.  On the test Oracle server (a little slower box) I left the query running over the weekend.  After 68 hours I killed the session.  Efforts to 10046 trace have not completed because I quickly run into 2gig log file limits on my system.  The first part of the trace file does show almost exclusively single block reads with db_file_sequential_read.  There are almost no other wait events (on test server). 

So, I am reading indexes and maybe I shouldn't in this case.  I have rewriten the query several ways but never really get it to finish on my test box.  gv$sql showed that during the 68 hours it accumulated some 702 MILLION disk I/O with nearly one BILLION buffer gets.

I am thinking the problem is that the query looks at the same 3 million row table three times, conceptually looping a subset of itself with itself.
Basically, the idea is this: Workorders can be parents which means they have other workorders that are children to them.  When users change the status to complete or close out all of chilren, they often don't change the status of the parent.  This query identifies all workorders that have AT LEAST one child, and where ALL of it's children have been completed, closed out or cancelled.

The query:
SELECT *
  FROM maximo.workorder
 WHERE istask = 0
  AND historyflag = 0
  AND wopriority <> 1
  AND status IN ('APPR', 'INPLN', 'INPRG', 'RDY', 'WMATL', 'WSCH', 'PEND')
  AND EXISTS (SELECT * -- this checks that the workorder is parent to one or more children workorders
                FROM maximo.workorder w2
                WHERE w2.PARENT = workorder.wonum AND w2.siteid = workorder.siteid AND w2.orgid = workorder.orgid)
  AND NOT EXISTS (SELECT *
                    FROM maximo.wfinstance
                    WHERE ownertable = 'WORKORDER' AND ownerid = workorderid AND active = 1 AND processname = 'WOAPPR')
  AND NOT EXISTS (
          SELECT * -- this says to only return workorders where ALL of it's children are CLOSED out
            FROM maximo.workorder w2
          WHERE w2.PARENT = workorder.wonum
            AND w2.siteid = workorder.siteid
            AND w2.orgid = workorder.orgid
            AND w2.status NOT IN ('COMP', 'CAN', 'CLOSE'))
  AND (  workorder.cstm_mocnum IS NULL
        OR (SELECT status
              FROM maximo.cstm_moc
            WHERE cstm_mocnum = workorder.cstm_mocnum AND siteid = workorder.siteid AND orgid = workorder.orgid) IN
                                                                                          ('CAN', 'COMP', 'PSSRCOMP')
      )
  AND NOT EXISTS (
          SELECT *
            FROM maximo.wpmaterial,
                maximo.invreserve
          WHERE wpmaterial.wonum = workorder.wonum
            AND wpmaterial.orgid = workorder.orgid
            AND wpmaterial.siteid = workorder.siteid
            AND invreserve.orgid = workorder.orgid
            AND invreserve.siteid = workorder.siteid
            AND invreserve.wonum = workorder.wonum
            AND wpmaterial.itemnum = invreserve.itemnum)

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name              | Rows  | Bytes |TempSpc| Cost (%CPU)| Time    |
---------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                |                    |    1 |  468 |      | 59880  (2)| 00:11:59 |
|*  1 |  FILTER                          |                    |      |      |      |            |          |
|  2 |  NESTED LOOPS ANTI              |                    |    1 |  468 |      | 59871  (2)| 00:11:59 |
|  3 |    NESTED LOOPS ANTI            |                    |    1 |  448 |      | 59868  (2)| 00:11:59 |
|*  4 |    HASH JOIN SEMI              |                    |    1 |  423 |    15M| 59856  (2)| 00:11:59 |
|  5 |      INLIST ITERATOR            |                    |      |      |      |            |          |
|*  6 |      TABLE ACCESS BY INDEX ROWID| WORKORDER          | 38912 |    15M|      |  4562  (1)| 00:00:55 |
|*  7 |        INDEX RANGE SCAN          | CSTM_WO_ST_HST_TSK | 40702 |      |      |  205  (1)| 00:00:03 |
|*  8 |      TABLE ACCESS FULL          | WORKORDER          |  2215K|    31M|      | 51651  (2)| 00:10:20 |
|*  9 |    TABLE ACCESS BY INDEX ROWID  | WFINSTANCE        |  2104 | 52600 |      |    12  (0)| 00:00:01 |
|* 10 |      INDEX RANGE SCAN            | CSTM_WFINST_OWNER  |  215 |      |      |    3  (0)| 00:00:01 |
|* 11 |    TABLE ACCESS BY INDEX ROWID  | WORKORDER          |    1 |    20 |      |    3  (0)| 00:00:01 |
|* 12 |    INDEX RANGE SCAN            | WORKORDER_NDX6    |    2 |      |      |    2  (0)| 00:00:01 |
|  13 |  TABLE ACCESS BY INDEX ROWID    | CSTM_MOC          |    1 |    19 |      |    2  (0)| 00:00:01 |
|* 14 |    INDEX UNIQUE SCAN            | CSTM_MOC_NDX2      |    1 |      |      |    1  (0)| 00:00:01 |
|* 15 |    TABLE ACCESS BY INDEX ROWID  | WPITEM            |    1 |    29 |      |    7  (0)| 00:00:01 |
|  16 |    NESTED LOOPS                |                    |    1 |    55 |      |    9  (0)| 00:00:01 |
|* 17 |      TABLE ACCESS BY INDEX ROWID | INVRESERVE        |    1 |    26 |      |    2  (0)| 00:00:01 |
|* 18 |      INDEX RANGE SCAN          | INVRESERVE_NDX3    |    1 |      |      |    1  (0)| 00:00:01 |
|* 19 |      INDEX RANGE SCAN            | CSTM_WPITEM_WONUM  |    6 |      |      |    2  (0)| 00:00:01 |
|* 20 |    INDEX RANGE SCAN            | SYNONYMDOM_NDX1    |    1 |    27 |      |    2  (0)| 00:00:01 |
|* 21 |      INDEX RANGE SCAN            | SYNONYMDOM_NDX1    |    1 |    27 |      |    2  (0)| 00:00:01 |
|* 22 |      INDEX RANGE SCAN          | SYNONYMDOM_NDX1    |    1 |    27 |      |    2  (0)| 00:00:01 |
|* 23 |        INDEX RANGE SCAN          | SYNONYMDOM_NDX1    |    1 |    27 |      |    2  (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  1 - filter(("WORKORDER"."CSTM_MOCNUM" IS NULL OR  (SELECT /*+ */ "STATUS" FROM "MAXIMO"."CSTM_MOC"
              "CSTM_MOC" WHERE "CSTM_MOCNUM"=:B1 AND "SITEID"=:B2 AND "ORGID"=:B3)='CAN' OR  (SELECT /*+ */ "STATUS"
              FROM "MAXIMO"."CSTM_MOC" "CSTM_MOC" WHERE "CSTM_MOCNUM"=:B4 AND "SITEID"=:B5 AND "ORGID"=:B6)='COMP'
              OR  (SELECT /*+ */ "STATUS" FROM "MAXIMO"."CSTM_MOC" "CSTM_MOC" WHERE "CSTM_MOCNUM"=:B7 AND
              "SITEID"=:B8 AND "ORGID"=:B9)='PSSRCOMP') AND  NOT EXISTS (SELECT /*+ */ 0 FROM "MAXIMO"."INVRESERVE"
              "INVRESERVE",MAXIMO."WPITEM" "WPITEM" WHERE "WPITEM"."SITEID"=:B10 AND "WPITEM"."WONUM"=:B11 AND
              "WPITEM"."ITEMNUM" IS NOT NULL AND "WPITEM"."ORGID"=:B12 AND ("LINETYPE"= (SELECT /*+ */ "VALUE" FROM
              MAXIMO."SYNONYMDOMAIN" "SYNONYMDOMAIN" WHERE "MAXVALUE"='ITEM' AND "DOMAINID"='LINETYPE') OR
              "LINETYPE"= (SELECT /*+ */ "VALUE" FROM MAXIMO."SYNONYMDOMAIN" "SYNONYMDOMAIN" WHERE
              "MAXVALUE"='MATERIAL' AND "DOMAINID"='LINETYPE') OR "LINETYPE"= (SELECT /*+ */ "VALUE" FROM
              MAXIMO."SYNONYMDOMAIN" "SYNONYMDOMAIN" WHERE "MAXVALUE"='EXTERNAL' AND "DOMAINID"='LINETYPE') OR
              "LINETYPE"= (SELECT /*+ */ "VALUE" FROM MAXIMO."SYNONYMDOMAIN" "SYNONYMDOMAIN" WHERE
              "MAXVALUE"='SPORDER' AND "DOMAINID"='LINETYPE')) AND "WPITEM"."ITEMNUM"="INVRESERVE"."ITEMNUM" AND
              "INVRESERVE"."WONUM"=:B13 AND "INVRESERVE"."SITEID"=:B14 AND "INVRESERVE"."ORGID"=:B15))
  4 - access("W2"."PARENT"="WORKORDER"."WONUM" AND "W2"."SITEID"="WORKORDER"."SITEID" AND
              "W2"."ORGID"="WORKORDER"."ORGID")
  6 - filter("WOPRIORITY"<>1)
  7 - access(("STATUS"='APPR' OR "STATUS"='INPLN' OR "STATUS"='INPRG' OR "STATUS"='PEND' OR
              "STATUS"='RDY' OR "STATUS"='WMATL' OR "STATUS"='WSCH') AND "HISTORYFLAG"=0 AND "ISTASK"=0)
  8 - filter("W2"."PARENT" IS NOT NULL)
  9 - filter("ACTIVE"=1 AND "PROCESSNAME"='WOAPPR' AND "OWNERID"="WORKORDERID")
  10 - access("OWNERTABLE"='WORKORDER')
  11 - filter("W2"."PARENT" IS NOT NULL AND "W2"."STATUS"<>'COMP' AND "W2"."STATUS"<>'CAN' AND
              "W2"."STATUS"<>'CLOSE' AND "W2"."PARENT"="WORKORDER"."WONUM" AND "W2"."ORGID"="WORKORDER"."ORGID")
  12 - access("W2"."SITEID"="WORKORDER"."SITEID")
  14 - access("ORGID"=:B1 AND "SITEID"=:B2 AND "CSTM_MOCNUM"=:B3)
  15 - filter("WPITEM"."ITEMNUM" IS NOT NULL AND "WPITEM"."ORGID"=:B1 AND ("LINETYPE"= (SELECT /*+ */
              "VALUE" FROM MAXIMO."SYNONYMDOMAIN" "SYNONYMDOMAIN" WHERE "MAXVALUE"='ITEM' AND "DOMAINID"='LINETYPE')
              OR "LINETYPE"= (SELECT /*+ */ "VALUE" FROM MAXIMO."SYNONYMDOMAIN" "SYNONYMDOMAIN" WHERE
              "MAXVALUE"='MATERIAL' AND "DOMAINID"='LINETYPE') OR "LINETYPE"= (SELECT /*+ */ "VALUE" FROM
              MAXIMO."SYNONYMDOMAIN" "SYNONYMDOMAIN" WHERE "MAXVALUE"='EXTERNAL' AND "DOMAINID"='LINETYPE') OR
              "LINETYPE"= (SELECT /*+ */ "VALUE" FROM MAXIMO."SYNONYMDOMAIN" "SYNONYMDOMAIN" WHERE
              "MAXVALUE"='SPORDER' AND "DOMAINID"='LINETYPE')) AND "WPITEM"."ITEMNUM"="INVRESERVE"."ITEMNUM")
  17 - filter("INVRESERVE"."ORGID"=:B1)
  18 - access("INVRESERVE"."SITEID"=:B1 AND "INVRESERVE"."WONUM"=:B2)
  19 - access("WPITEM"."WONUM"=:B1 AND "WPITEM"."SITEID"=:B2)
  20 - access("DOMAINID"='LINETYPE' AND "MAXVALUE"='ITEM')
  21 - access("DOMAINID"='LINETYPE' AND "MAXVALUE"='MATERIAL')
  22 - access("DOMAINID"='LINETYPE' AND "MAXVALUE"='EXTERNAL')
  23 - access("DOMAINID"='LINETYPE' AND "MAXVALUE"='SPORDER')

You see there are other tables besides the workorder involved in the filtering, but I don't think they are so much the issue because if I remove the IF EXISTS workorder subslect and the IF NOT EXists workorder subselect the query runs in 1.5 minutes.
So I am trying to think of a better way to accomplish this.
One thing I did was replace the NOT EXISTS workorder subquery with this:
  AND 0 = (
          SELECT COUNT(*) -- this says to only return workorders where ALL of it's children are CLOSED out
            FROM maximo.workorder w2
          WHERE w2.PARENT = workorder.wonum
            AND w2.siteid = workorder.siteid
            AND w2.orgid = workorder.orgid
            AND w2.status NOT IN ('COMP', 'CAN', 'CLOSE'))

This query never returns either, just sits there racking up I/O for days.

If I run a query looking only for workorders that are parents, there are only 19,049 rows that meet that criteria. So for each one of those rows, I have to look at all of its children to see if they are all closed out. An index is being used to get these children. I see how that is a fair amount of work, but 1 BILLION buffer gets?

So, I'm looking bad here guys.  I spent a lot of company money on the OPINT class and came back thinking I was pretty slick.  Now I have spent about 7 days of company time working on one stupid little query.  I'm not impressing anybody....
Charles Young Send private email
Monday, December 10, 2007
 
 
Hi Charles,

some quick thoughts:
I guess you don't have the plan, which was used in 9i.

Why not downgrade your test database to 9i and verify that the statement completes there. Then you can save the plan (get it from V$SQL_PLAN, SQL trace or Statspack, but not from EXPLAIN PLAN. The latter may lie. You could also create a stored outline for the statement in 9i.

Then you could compare the plan from 9i with the one from 10g. And you might also import the stored outline from 9i into 10g. This should make the statement use the same plan as was used in 9i. It's a good idea to capture plans with Statspack before an upgrade and to export the Statspack tables. Then you retain access to the old plans. Import Statspack dump and check the old plan.

I would also consider that the query might be looping due to a bug. If you can hand the tables to Oracle support as a 9i dump file and some proof that it works fine in 9i but not in 10g, this would justify a TAR.

Have you checked optimizer statistics? I have seen 10g automatic statistics collection using a sample size which was much too small and resulted in statistics being quite inaccurate.

You might also try creating system statistics with DBMS_STATS, hoping to get a better plan. This is trial and error though. But it doesn't waste much time. I could mail you a script for that, if you need it. It gather system stats over several intervals. Then it's up to you, to calculate an average for CPUSPEED, SREADTIM, MREADTIM, etc. and import into the data dictionary.

I also recommend reading Cost-Based Oracle Fundamentals by Jonathan Lewis.

Kind regards,
Norbert
Norbert Debes Send private email
Tuesday, December 11, 2007
 
 
Thanks Norbert.  I do have a 9i environment and can verify that the query runs in about 2 minutes there.  I tried manually hinting the plan to make it look like the 9i version; but couldn't quite get the same because 10g has apparently has some new paths that it kept wanting to use.  I didn't think of the stored outlines, do you know for sure that a stored outline would move from 9i to 10g? I will see about doing that.

Haven't called Oracle support yet, it is something of a protocol process here and I have to go through another IT group to do it - just making sure I have done what I can before I go there.

Thanks.
Charles Young Send private email
Tuesday, December 11, 2007
 
 
Improving index selectivity (create more selective indexes so that the filtering occurs in the index and reduces the need to do table accesses) might help.  I'd also consider a CONNECT BY to establish the hierarchical relationship within the table and to identify the children without having to make multiple accesses of the table.  If that's not appropriate for your needs, then I'd also suggest subquery factoring (WITH clause) to reduce the multiple accesses to WORKORDER.
Karen Morton Send private email
Tuesday, December 11, 2007
 
 
Hi Charles,

it's a drag to track something like this down. Very time consuming. I know how you feel after investing so much time.

I've personally never transported outlines between databases, but I know that it's supported.

Since you do get the good plan in 9i, you should definetely create an outline for it. The outline will contain all the hints that you need to specify. The problem with hinting is that
1. there are query blocks (pretty much undocumented). If you don't place the hint in the right query block or don't specify the query block with the hint, the optimizer may not know what to do with the hint
2. as long as you don't specify a full set of hints, the optimizer needs to make some decisions itself and may end up with a plan that ignores your hints. An outline contains all the hints to specify the ENTIRE plan.

You can transport outlines with exp/imp. It should work between 9i and 10g. The 10g Performance Tuning Guide has some material on stored oulines/plan stability.  Outlines may be edited with PL/SQL or OEM.

I would also take a 10053 CBO trace of both 9i and 10g. This will also give you differences in optimizer environment between 9i and 10g. Jonathan Lewis devotes a chapter to the 10053 trace. It will tell you whether the 10g CBO considers the plan that is used in 9i. If it does, it must have higher cost in 10g than it did in 9i.

In 10g, the 10053 trace contains all the hints that would be used for a stored ouline. Here's an example:
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('10.2.0.1')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
INDEX(@"SEL$1" "L"@"SEL$1" ("LOCATIONS"."CITY"))
INDEX(@"SEL$1" "D"@"SEL$1" ("DEPARTMENTS"."LOCATION_ID"))
INDEX(@"SEL$1" "EMP"@"SEL$1" ("EMPLOYEES"."DEPARTMENT_ID"))
INDEX(@"SEL$1" "J"@"SEL$1" ("JOBS"."JOB_ID"))
INDEX(@"SEL$1" "MGR"@"SEL$1" ("EMPLOYEES"."EMPLOYEE_ID"))
LEADING(@"SEL$1" "L"@"SEL$1" "D"@"SEL$1" "EMP"@"SEL$1" "J"@"SEL$1" "MGR"@"SEL$1")
USE_NL(@"SEL$1" "D"@"SEL$1")
USE_NL(@"SEL$1" "EMP"@"SEL$1")
USE_NL(@"SEL$1" "J"@"SEL$1")
USE_NL(@"SEL$1" "MGR"@"SEL$1")
END_OUTLINE_DATA
*/

The LEADING hint controls the join order. You might use the hints from the 10053 trace as a starting point. @"SEL$1" is an example of a query block name.

You mention new paths. Have you played with optimizer_features_enable? If new access methods, which were not available in 9i, are the problem, you might disable them in 10g with OPTIMIZER_FEATURES_ENABLE=9.2.0.

Kind regards, Norbert
Norbert Debes Send private email
Tuesday, December 11, 2007
 
 
I'd actually use an outline as my last option.  There is a reason the optimizer is doing what it is doing and for the long run, you need to determine what you need to do to get the plan you want without having to force it if possible.  If this is a firefight and you need it to work "now", then an outline could be a short term solution, but I don't think it should be your ultimate solution.

I still think a rewrite will get you there.  I'll be anxious to hear if CONNECT BY, a WITH clause, or some other alternative gets you where you need to be.
Karen Morton Send private email
Tuesday, December 11, 2007
 
 
So a coworker was better at thinking outside of the box than I was.  Here was his solution which runs in about 3 minutes. It required no outlines or hints.

SELECT w1.*
  FROM maximo.workorder w1,
      (SELECT parent, siteid, orgid -- get set of all children that have at least one completed workorder
      FROM maximo.workorder w3
      WHERE w3.PARENT is not null
        AND status in ('COMP', 'CAN', 'CLOSE')
      GROUP BY parent, siteid, orgid
      minus
      SELECT parent, siteid, orgid -- remove from above set if that same parent, siteid, orgid has some children that are NOT completed
      FROM maximo.workorder w4
      WHERE w4.PARENT is not null
        AND status not in ('COMP', 'CAN', 'CLOSE')
      GROUP BY parent, siteid, orgid) w2 -- use this as an inline view
  WHERE w1.istask = 0
  AND w1.historyflag = 0
  AND w1.wopriority <> 1
  AND w1.status IN ('APPR', 'INPLN', 'INPRG', 'RDY', 'WMATL', 'WSCH', 'PEND')
  AND w1.wonum = w2.parent
  AND w1.siteid = w2.siteid
  AND w1.orgid = w2.orgid
  -- no changes to original query below this point
  AND NOT EXISTS (SELECT *
                    FROM maximo.wfinstance
                    WHERE ownertable = 'WORKORDER' AND ownerid = w1.workorderid AND active = 1 AND processname = 'WOAPPR')
  AND ( w1.cstm_mocnum IS NULL
        OR (SELECT status
              FROM maximo.cstm_moc
            WHERE cstm_mocnum = w1.cstm_mocnum AND siteid = w1.siteid AND orgid = w1.orgid) IN
                                                                                            ('CAN', 'COMP', 'PSSRCOMP')
      )
  AND NOT EXISTS (
          SELECT *
            FROM maximo.wpmaterial,
                maximo.invreserve
          WHERE wpmaterial.wonum = w1.wonum
            AND wpmaterial.orgid = w1.orgid
            AND wpmaterial.siteid = w1.siteid
            AND invreserve.orgid = w1.orgid
            AND invreserve.siteid = w1.siteid
            AND invreserve.wonum = w1.wonum
            AND wpmaterial.itemnum = invreserve.itemnum)

I went over to him this afternoon and told him it was a brillant solution and asked his thinking on how he got there.
He said that he too identified that all of that work against the workorder table three times was the main issue.
From there, he said he just thought about what the query was trying to accomplish and some other way to get there.
Using the inline view to produce a single list to join with worked well.

So, while I was spending all of this time with my nifty tools having the DBA's install the Harness and try to get it running, figure out LIO's, 10046 trace files and such, he just sat down and re-wrote the query in a way I hadn't thought of.  The only tool he used was explain plan.  In a few hours he had achieved what I have spent over a week on.  I did contribute some in adding some columns to an existing index which allowed the query to get workorder information directly from the index in an Index Fast Scan instead of doing full table scans; but honestly even with the full table scans it runs in production in about 3 minutes which is acceptable to the business.  Still, we will suggest adding the index to help reduce resource consumption since this query is run once an hour.

So, I am sitting here wondering what are my lessons learned?
Charles Young Send private email
Wednesday, December 12, 2007
 
 
I'd say that the primary lesson is to do what we suggest in class:  "tune" the question - not the query. 

Tools and metrics help you see where your current problems are and help prove your solution's effectiveness.  But, the key to finding the solution is to first know what the query is trying to achieve. 

Without knowing about the question (or the data model, etc), many solutions can be proposed.  That's why stored outlines, hints and other syntax options like CONNECT BY or the WITH clause were suggested to you by Norbert and I.  Suggestions can help you "think outside the box", but your colleague was able to get to the heart of the matter as you noted: He identified the problem and thought about what the query was trying to accomplish and wrote the query as needed to get there.  The response time confirmed he'd chosen wisely.

So, remember to use your metrics to help identify problem areas and as confirmation that your solutions are efficient.  The metrics will shed light on where the issues are: in this case, you could identify that the issue was primarily with the multiple accesses on the workorder table.  But, don't expect the metrics to tell you how to write the query.  That's actually good for us!  I mean, if a tool existed that could write the query for us, then we wouldn't have jobs, right?  :)
Karen Morton Send private email
Wednesday, December 12, 2007
 
 
Karen you are right, the metrics all pointed me to the fact that there was way too many LIO's and the query needed a major rewrite as you suggested earlier. Part of what was leading me astray was the knowledge that it worked okay as written in 9i.  This kept me thinking I didn't really need to rewrite it, I just needed to get it to run with a similar explain plan in 10g.

My other failing was in having seen how the query was written, I just couldn't think of another way to write it.  I couldn't "think outside the box" of the existing query. 

I want to thank you and Norbet for your help.  I was stuck and you guys were giving me suggestions which I am sure would have eventually led to a solution - maybe not my coworker's solution but something workable.
Charles Young Send private email
Thursday, December 13, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Discussion Groups Main Page

Powered by FogBugz