RSS Feed
Method R Discussion Group

Method R helps developers, DBAs, and decision-makers optimize Oracle-based software in every phase of the software life cycle.

Discuss Method R products, techniques, and events.

Ask new questions or post your own responses.

Case Study: Winning with Histograms

Having just finished the OPINT class last week, I came to work ready to tune. I decided to present this as sort of a case study. In my company it almost takes an act of Congress to install any outside software, even if it is free like the Hotsos test harness. So it will probably be a couple of months before I can install the test harness on a test system. So I had to use the concepts but not the test harness in this case.

Our business complaint was general slowness of the system and various times. We had already tuned some business critical queries earlier, and I am now looking for queries that don't run terrible, but run often that could be improved. Using this query against v$sql, I identified the top ten queries running in production that accumulated the most time:

SELECT *
FROM
(SELECT
s.sql_text,
ROUND((s.elapsed_time/100000),2) Total_Elapsed_Sec,
case s.executions
WHEN 0 THEN round(s.elapsed_time/1000000,2)
ELSE Round((s.elapsed_time/s.executions)/1000000,2 )
END as Elapsed_Per_Exec,
s.buffer_gets,
s.disk_reads,
case s.cpu_time
WHEN 0 THEN 0
ELSE Round((s.cpu_time/1000000),2 )
END as CPU_Secs,
s.loads,
s.sorts,
s.parse_calls,
s.executions,
s.fetches,
s.rows_processed,
s.FIRST_LOAD_TIME,
s.module,
s.child_number,
s.hash_value,
s.address,
s.plan_hash_value,
s.optimizer_cost,
s.sharable_mem,
s.action,
s.invalidations,
s.parsing_user_id
FROM v$sql s
ORDER BY s.elapsed_time desc)
WHERE ROWNUM < 11

One of those queries is a regularly scheduled query that looks like this:

SELECT *
FROM maximo.wfassignment
WHERE (processname = 'PRAPPR'
AND escrole = 'AUTOACCEPT'
AND assignstatus = 'ACTIVE')
AND NOT EXISTS
(SELECT ownerid
FROM maximo.escrepeattrack
WHERE ownerid = wfassignmentid
AND objectname = 'WFASSIGNMENT'
AND refpointid = 51
AND escalation = 'PRAUTOAPPR')
------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | -------------------------------------------------------| 0 | SELECT STATEMENT | | 94 | 16356 | 6583 | |* 1 | FILTER | | | | | |* 2 | TABLE ACCESS FULL | WFASSIGNMENT | 94 | 16356 | 6395 | |* 3 | INDEX UNIQUE SCAN | ESCREPEATTRACK_NDX | 1 | 27 | 2 | -----------------------------------------------------------Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "MAXIMO"."ESCREPEATTRACK" "ESCREPEATTRACK" WHERE "ESCREPEATTRACK"."OWNERID"=:B1 AND "ESCREPEATTRACK"."OBJECTNAME"='WFASSIGNMENT' AND "ESCREPEATTRACK"."REFPOINTID"=51 AND "ESCREPEATTRACK"."ESCALATION"='PRAUTOAPPR'))
2 - filter("SYS_ALIAS_1"."PROCESSNAME"='PRAPPR' AND "SYS_ALIAS_1"."ESCROLE"='AUTOACCEPT' AND "SYS_ALIAS_1"."ASSIGNSTATUS"='ACTIVE')
3 - access("ESCREPEATTRACK"."ESCALATION"='PRAUTOAPPR' AND "ESCREPEATTRACK"."REFPOINTID"=51 AND "ESCREPEATTRACK"."OBJECTNAME"='WFASSIGNMENT' AND "ESCREPEATTRACK"."OWNERID"=:B1) Note: cpu costing off

(Oracle V9.2 Database)
This query took about 67K LIO's. Since 05-14-2007 this query has run 1485 times averaging about 14 seconds per execution and returning some rows only about half the time. Total cumulative time processing during those 1485 executions was 5.6 hours with over 100 million LIO's of which 54 million were disk reads.

Here is why I think the Oracle optimizer chooses a bad access path plan for this query.
Looking at the WHERE clause of wfassignment, we see criteria for these three columns processname, escrole, assignstatus.
I looked at the data distribution, which would have been easier with the test harness...

SELECT COUNT(*) AS Count, processname
FROM maximo.wfassignment
GROUP BY processname
ORDER BY COUNT(*)

COUNT PROCESSNAME
------- -----------
2 SELFREG
24 CONTAPPR
1090 INVPRCDISC
1869 INVFRTDISC
6048 MOCAPPR
12721 INVEXTDISC
36609 INVFNLAPPR
57193 NVQTYDISC
64415 POAPPR
78053 PRAPPR
808444 WOAPPR
1484139 INVAPPR

SELECT COUNT(*) AS Count, escrole
FROM maximo.wfassignment
GROUP BY escrole
ORDER BY COUNT(*)

COUNT ESCROLE
-------- ----------
20620 SUPERVISOR
29180 WF_SUPVSR
330704 AUTOACCEPT
2170105 (NULL)

SELECT COUNT(*) AS Count, assignstatus
FROM maximo.wfassignment
GROUP BY assignstatus
ORDER BY COUNT(*)

COUNT ASSIGNSTATUS
------- ------------------
592 DEFAULT
1354 ESCALATED
5632 FORWARDED
23845 ACTIVE
750020 INACTIVE
1769168 COMPLETE

So what this all means is that the data values are heavily skewed in all three tables.
Processname is not very selective when you look for 'INVAPPR', but it is fairly selective when looking for 'PRAPPR' as we are in this query. In the escrole column, we are looking for 'AUTOACCEPT', which has 330K rows so it isn't very selective.
Assignstatus is fairly selective if we are looking for 'ACTIVE', but not if we are looking for "COMPLETE'.

Out of the box, Oracle assumes all data to be equally distributed. Therefore it doesn't know that in this query, it would be best too look at assignstatus first (only 24K rows to look for), then filter the other criteria against those 24K rows. Instead, it does a full table scan of wfassignment and filters them all.

So, according to class, this is a job for histograms. When this type of statistics is gathered against the wfassignent processname, escrole, assignstatus columns, Oracle understands the data skew and uses it to determine the best access path. We just made the Oracle optimizer smarter about this table!

I ran the commands for such a histogram in our test environment. After that, the optimizer did choose to look at assignstatus first, and since there was already an index available with assignstatus in it, the query ran 0.16 seconds instead of 14. It had only about 6K LIO's compared to 65K LIO's originally.

PLAN_TABLE_OUTPUT -------------------------
| Id | Operation | Name | Rows | Bytes | Cost | -------------------------
| 0 | SELECT STATEMENT | | 4 | 708 | 1288 | |* 1 | FILTER | | | | | |* 2 | TABLE ACCESS BY INDEX ROWID| WFASSIGNMENT | 4 | 708 | 1280 | |* 3 | INDEX RANGE SCAN | WFASSIGN_NDX4 | 26134 | | 196 | |* 4 | INDEX UNIQUE SCAN | ESCREPEATTRACK_NDX | 1 | 27 | 2 | -----------------------------------------------------------Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "MAXIMO"."ESCREPEATTRACK""ESCREPEATTRACK" WHERE "ESCREPEATTRACK"."OWNERID"=:B1 AND "ESCREPEATTRACK"."OBJECTNAME"='WFASSIGNMENT' AND "ESCREPEATTRACK"."REFPOINTID"=51 AND "ESCREPEATTRACK"."ESCALATION"='PRAUTOAPPR')) 2 - filter("SYS_ALIAS_1"."PROCESSNAME"='PRAPPR' AND "SYS_ALIAS_1"."ESCROLE"='AUTOACCEPT') 3 - access("SYS_ALIAS_1"."ASSIGNSTATUS"='ACTIVE') 4 - access("ESCREPEATTRACK"."ESCALATION"='PRAUTOAPPR' AND "ESCREPEATTRACK"."REFPOINTID"=51 AND "ESCREPEATTRACK"."OBJECTNAME"='WFASSIGNMENT' AND "ESCREPEATTRACK"."OWNERID"=:B1) Note: cpu costing is off

The advantage of using the histogram over an index hint is that other queries, including ad-hoc queries may well benefit when similar criteria is used.

BEGIN
DBMS_STATS.gather_table_stats(
ownname => 'MAXIMO',
tabname => 'WFASSIGNMENT',
method_opt =>'FOR COLUMNS PROCESSNAME SIZE 12, ESCROLE SIZE 5, ASSIGNSTATUS SIZE 7');
END;

Using these histograms, I would anticipate the next 1485 executions to consume only about 4 minutes of time and 9 million LIO's, a definite win!

Charles Young

Friday, June 15, 2007

Before = 336 minutes
After = 4 minutes

Before = 100M LIOs
After = 9M LIOs

...and done in a manner that's likely to have collateral benefits upon other parts of the application.

Sounds like a pretty good day's work to me. Thanks VERY much, Charles, for sharing.

Cary Millsap

Friday, June 15, 2007

It's great to see how you were able to apply the knowledge gained from the class to a performance issue right away and with such great success!

Karen Morton

Friday, June 15, 2007

Glad to hear the Histogram section of Chapter 3 worked out for you so well. And , good luck on that act of Congress !!! hds.sql would have come in handy for you.

Monty Orme

Wednesday, June 20, 2007

This topic is archived. No further replies will be accepted.

RSS Feed Method R Discussion Group

Case Study: Winning with Histograms

RSS Feed
Method R Discussion Group