Method R Discussion GroupMethod R helps developers, DBAs, and decision-makers optimize Oracle-based software in every phase of the software life cycle. |
||
Discuss Method R products, techniques, and events. Ask new questions or post your own responses. |
After analysis of an Oracle Extended SQL trace file of a select statement issued against an Oracle 10.2.0.3 database on Windows, we noticed extremely exaggerated elapsed times of some timed events. For example 162 'db file scattered read' events had an elasped time of > 268s, and some 'db file sequential read' event had elapsed times > 550s. What is really confusing, is that the events that have an extremely large elapsed time seem to have a timestamp that occurs before the timestamp of the previous call!! How is this possible?
Take the following excerpt of subsequent lines of the trace file: PARSING IN CURSOR #28 len=121 dep=3 uid=0 oct=3 lid=0 tim=5589942658 hv=3150898423 ad='8da1cae0' select /*+ rule */ bucket, endpoint, col#, epvalue from histgrm$ where obj#=:1 and intcol#=:2 and row#=:3 order by bucket END OF STMT EXEC #28:c=0,e=106,p=0,cr=0,cu=0,mis=0,r=0,dep=3,og=3,tim=5589942656 WAIT #28: nam='db file sequential read' ela= 1293469516 file#=1 block#=53425 blocks=1 obj#=0 tim=5588436818 Note the 3 timestamp values in these lines ... 1. tim=5589942658 2. tim=5589942656 <= 2 microseconds before statement #1 executed 3. tim=5588436818 <= 1505838 microseconds (1.5 seconds) before statement #2 executed The third line of 'db file sequential read' has the massively overblown elapsed time of ela=1293469516 Does anyone know what might cause Oracle to think time is moving backwards? Could this be because the server is constantly updating it's time from some sort of internet time server, or something similar?
The last time that I saw time moving backwards in a trace file was on a version 9.0.1 database on Windows. It was a one-time anomaly, and subsequent traces of the same program behaved properly.
After checking with the Hotsos Consulting and R&D teams, the consensus was that timestamps in trace files occasionally "wrap" or run backwards, usually due to instrumentation bugs in the Oracle kernel or oddities in the operating system. Tracing the same program again should result in a "correct" trace file. While it is annoying when a trace file is "incorrect", after analyzing thousands of trace files over the years, I can count the number of them that exhibited this problem on one hand. Trace files contain the most accurate picture of what occurred in an Oracle process, but on rare occasions such as the one you experienced, they show that even accurate software still has flaws. In spite of the inaccurate timings in the trace file, the Hotsos Profiler should still be able to show you the worst-performing code in the trace. In my case with the 9.0.1 database, Profiler showed a large number of calls across a database link, and as usual, the AMOUNT of I/O was the cause of the performance issue more so than the time that each call took to complete. It is unfortunate that the current version of 10g experienced this issue, but I suggest tracing the program again to see if the problem recurs.
Thanks very much Craig for your input. However, in our case the results are reproduceable every time the SQL is executed, and the exaggerated wait event times are also showing up in AWR reports. We've also found these exaggerated wait times and backwards-time issues on other database servers in the enterprise.
We created a TAR with Oracle, and they claim the cause is because the Oracle server is updating its clock time from some sort of external time source, such as an internet time server. If this is the case, I'm curious if anyone else has encountered this problem. Since our trace files are showing inaccurate event timings, it's nearly impossible to follow method R to find our true performance problem since these exaggerated event timings are skewing the result. Has anyone else ever encountered this issue before on a consistent basis? |
Powered by FogBugz