Closed
Bug 475343
Opened 16 years ago
Closed 16 years ago
Region maintenance can get very expensive
Categories
(Core :: Web Painting, defect, P2)
Tracking
()
RESOLVED
FIXED
People
(Reporter: bzbarsky, Assigned: roc)
Details
(Keywords: fixed1.9.1, perf)
Attachments
(5 files, 1 obsolete file)
|
49 bytes,
image/gif
|
Details | |
|
4.06 KB,
text/html
|
Details | |
|
2.48 KB,
patch
|
jaas
:
review+
roc
:
approval1.9.1+
|
Details | Diff | Splinter Review |
|
2.00 KB,
patch
|
vlad
:
review+
roc
:
approval1.9.1+
|
Details | Diff | Splinter Review |
|
1.60 KB,
patch
|
roc
:
review+
roc
:
approval1.9.1+
|
Details | Diff | Splinter Review |
Quoting from email:
------------------------------------------------------------------------
It seems Firefox is unusually slow compared to Chrome on the following JavaScript Mandelbrot Set:
http://warp.povusers.org/MandScripts/javascript2.html (click on Proceed in the bottom)
The problem, however, is not in JavaScript component (less than 1% of the cycles is in js3250.dll). Following is the VTune data about retired instructions. A huge 80% of the total cycles is spent in gkgfx, and 85% of gkgfx is in nsRegion.cpp. A lot of rectangle insertions, deletions, and traversals are happening.
------------------------------------------------------------------------
------------------------------------------------------------------------
It looks like we spend (on my hardware) about 1300ms getting to close().
Opera takes 1500, Safari 400. After that, though, we have a
multi-second unresponsive hang, which is presumably what Moh had profiled.
ccing roc and vlad, since this stuff is their kettle of fish.
It looks like the images load, one by one, and invalidate their rects.
There are 10,000 such rects, of course.
So we basically end up under [ChildView drawRect:] called from NSView
code, and from there spend our time in a 2-1 ratio under
nsViewManager::Refresh and nsThebesRegion::Union. The latter is
certainly the building up of the region under drawRect. The former is
largely the nsRegion::Or call from ConvertNativeRegionToAppRegion,with
some smattering of actually painting. The time spent painting is almost
entirely under nsDisplaClip::OptimizeVisibility doing region operations.
In other words, we spend more time sorting out the right region to paint
that it would take to just paint everything.
That reminds me strongly of bug 253001. Would it make
sense to apply a similar approach here? It looks like the opaque region
thing got backed out as part of the display list landing, because all
that code went away. But we should probably do it for both damage and
opaque regions.
------------------------------------------------------------------------
Flags: wanted1.9.2?
| Reporter | ||
Comment 1•16 years ago
|
||
| Reporter | ||
Comment 2•16 years ago
|
||
| Assignee | ||
Updated•16 years ago
|
Flags: wanted1.9.1? → wanted1.9.1+
| Assignee | ||
Comment 5•16 years ago
|
||
For some reason my profiles look nothing like what bz described, although I do get a region with 30,000 rects in it.
| Assignee | ||
Comment 6•16 years ago
|
||
We use SimpleSubtract in nsDisplayItem::OptimizeVisibility, so we might burn some cycles there unnecessarily but it shouldn't be too bad.
| Assignee | ||
Comment 7•16 years ago
|
||
This patch should take the heat off the spots that Boris was talking about. It doesn't seem to make any difference to me, which isn't surprising since I don't see those as hotspots.
| Assignee | ||
Comment 8•16 years ago
|
||
Boris, was the second section of comment #0 your own testing? If so, maybe you could try this patch.
| Reporter | ||
Comment 9•16 years ago
|
||
Yes, the second section was my own testing. And yes, this patch makes the beachballing I observed better (though we still seem to have a tad of lag there; not sure where; it's noticeably less snappy than Safari or Opera).
Just out of curiosity, what _did_ your profiles look like?
| Reporter | ||
Comment 10•16 years ago
|
||
Profiling the part after the DOM is constructed with the patch, the time seems to be spent under nsIFrame::Invalidate. In particular, we call nsChildView::Invalidate, which calls [NSView setNeedsDisplayInRect;] which calls [NSRegion addRect], which calls CGSUnionRegionWithRect, which calls shape_union_with_bounds, which calls shape_union. All the time is spent in shape_union, as far as I can see. setNeedsDisplayInRect and everything below it looks like it's part of AppKit or CoreGraphics, so if we're to do something about this we'd need to do it inside the nsChildView::Invalidate method, right?
| Assignee | ||
Comment 11•16 years ago
|
||
My profiles showed a lot of time in frame construction and most of the rest of the time in inputstream callbacks, processing the image loads I presume.
These profiles are in debug builds, which is somewhat bogus but should be OK here.
I think I lied in comment #5. I got a display list with 30,000 items in it but I didn't prove that the region has thousands of rectangles. Are you on Tiger or Leopard? If you're on Tiger, I wonder if Leopard is doing some kind of throttling of the size of the update region.
Fixing nsChildView::Invalidate to avoid a problem in Appkit would be tricky. We'd have to track our own idea of what the dirty rect is and bash Appkit's rect with a bounding-rect if it's getting too complicated. Ugh.
| Reporter | ||
Comment 12•16 years ago
|
||
What were you profiling? Your profile sounds like the profile of the script execution, whereas I was profiling the beachball that we get when the image loads come in...
I'm on Leopard.
| Assignee | ||
Comment 13•16 years ago
|
||
What I see after I press the button is a delay, then the document is replaced by the Mandelbrot image overlaid with a grid which I presume is unloaded image alt-rendering, then there's another delay and the grid disappears, I presume because the images have loaded.
I profiled both delay periods separately. Maybe I forgot to check deep enough inside the image load stuff, I'll have another go.
| Reporter | ||
Comment 14•16 years ago
|
||
Yeah, that second delay is what I was profiling. Over here it was 4 seconds or so (compared to 1 sec for the first delay).
Comment 15•16 years ago
|
||
Also, note that "View -> Reload" is very slow. On Windows, the reload progress is observable almost line by line.
| Assignee | ||
Comment 16•16 years ago
|
||
I just reran the test. For me it's about 4 seconds for the initial delay and (feels like) less for the second delay.
I profiled the second delay with manual stop/start. It might be better if we instrumented the test to automatically start and stop Shark.
My profile of the second delay spends about 22% of time painting along two different paths, about 54% of the time in PresShell::WillPaint flushing stuff (30% ProcessPendingRestyles and 23.5% ProcessReflowCommands; I presume this is due to the alt-image stuff being replaced by the real image frames), and 23.5% of the time is in nsInputStreamPump::OnInputStreamReady.
In the input stream pump, 15% is OnStopRequest and 8% is OnDataAvailable. The OnStopRequest cost seems to all be secure browser and progress listener work. In OnDataAvailable we end up spending about 6% in nsIFrame::Invalidate ... eventually we spend 1.5% in nsChildView::Invalidate.
Focusing on painting we spend 37% of the time under nsCSSRendering::PaintBackground, 11% of the time in nsImageFrame::PaintImage, and the rest (i.e. about 50%) is mostly frittered away on display list stuff, which isn't too bad since this testcase is pessimal for display lists.
Remember this is a debug build. I don't know if that explains why my profile is quite different from yours. I guess I'll try with an opt build at some point but it takes a while to make on, I'll need to do it over lunch or at night or something.
Do you think it's worth going ahead with the patch I attached?
| Reporter | ||
Comment 17•16 years ago
|
||
Huh. That's totally different from what I saw... My profiling was in fact with an opt build.
The attached patch does help me, so it might be worth doing. That said, it's a patch in Mac-specific code. The original performance issue was observed on Windows, I assume (c.f. comparison to Chrome and use of VTune to profile). Do we need similar fixes for our other widget impls?
Comment 18•16 years ago
|
||
It's rare to find DEBUG code faster than opt, but could that be happening here?
roc, you love dogfooding debug builds but they are terrible profiler targets. Can you at least reproduce bz's results for opt?
/be
| Assignee | ||
Comment 19•16 years ago
|
||
We can do something similar for Windows, sure. I'll do a patch after I've had several hours to build in my Windows VM.
Brendan: yeah, I'll do that later.
| Assignee | ||
Comment 20•16 years ago
|
||
OK, in my Mac opt build, changeset 784122b49e33 (no patches), the delay between displaying the Mandelbrot image covered by the grid of missing-image borders, and then displaying the final image, is less than a second.
| Assignee | ||
Comment 21•16 years ago
|
||
Maybe it's a network issue.
| Assignee | ||
Comment 22•16 years ago
|
||
In my Windows opt build with the same changeset, I also see less than a second's delay.
| Assignee | ||
Comment 23•16 years ago
|
||
This does for Windows what the other patch does for Mac. I'd be interested in hearing if it helps. Of course, since my builds are already pretty quick for some reason, it doesn't make any difference for me.
Comment 24•16 years ago
|
||
roc,
This is quite an improvement. Previously, on Win32, the distribution of sampled instructions was:
Module %INST #INST
------------ ------ --------------
gkgfx.dll 80.52% 31,236,600,000
win32k.sys 6.37% 2,471,600,000
gklayout.dll 4.98% 1,931,200,000
xpcom_core.dll 1.75% 679,400,000
ntoskrnl.exe 1.07% 414,200,000
thebes.dll 0.77% 298,200,000
imglib2.dll 0.76% 293,600,000
ntdll.dll 0.71% 274,800,000
gkparser.dll 0.67% 258,200,000
hal.dll 0.53% 205,400,000
js3250.dll 0.51% 196,600,000
With your patch in Comment #23, the profile is:
Module %INST #INST
--------------- ----- -------------
gklayout.dll 39.15% 1,909,600,000
xpcom_core.dll 14.63% 713,800,000
ntdll.dll 7.06% 344,600,000
imglib2.dll 6.52% 318,200,000
thebes.dll 6.29% 306,800,000
gkparser.dll 5.29% 258,200,000
js3250.dll 4.39% 214,200,000
The contribution of gkgfx.dll to the exec time has dropped from 81% to 2%.
Altogether, now it takes 2 secs to paint.
The profile of hot functions in gklayout.dll currently is:
Name %INST #INST
---------------------- ----- -----------
SelectorMatches 7.46% 142,400,000
EnumerateAllRules 1.69% 32,200,000
HasStateDependentStyle 1.59% 30,400,000
RuleProcessorData 1.34% 25,600,000
Equals 1.27% 24,200,000
HasFlag 1.25% 23,800,000
GetOffsetTo 1.18% 22,600,000
QueryInterface 1.01% 19,200,000
| Assignee | ||
Comment 25•16 years ago
|
||
Comment on attachment 361466 [details] [diff] [review]
fix?
OK, sounds like we should take this...
Attachment #361466 -
Flags: review?(joshmoz)
| Assignee | ||
Comment 26•16 years ago
|
||
Comment on attachment 361927 [details] [diff] [review]
Windows fix?
And this...
Attachment #361927 -
Flags: review?(vladimir)
| Assignee | ||
Comment 27•16 years ago
|
||
I haven't even compiled this, but Karl, if you could build it and test it, that'd be great :-)
Attachment #361962 -
Flags: review?(mozbugz)
| Assignee | ||
Updated•16 years ago
|
Whiteboard: [needs review]
Comment on attachment 361927 [details] [diff] [review]
Windows fix?
This looks fine, but why don't we put MAX_RECTS_IN_REGION in nsBaseWidget.h under a suitable name to avoid duplicating it in 3 different places? Or do you foresee us wanting it to be different on different platforms?
Attachment #361927 -
Flags: review?(vladimir) → review+
| Assignee | ||
Comment 29•16 years ago
|
||
I don't really care, but I don't see a reason why it has to be the same on each platform.
Comment 30•16 years ago
|
||
Comment on attachment 361962 [details] [diff] [review]
fix
Without this patch, the maximum number of rectangles that I'm seeing in the region from the expose event on x86_64/Linux is 14, so this patch is not going to make any difference here.
After clicking proceed, I see a delay of 3 seconds before the grid becomes visible, then ~1.5 seconds before the grid disappears. Reloading the page also takes ~4.5 seconds.
Are these times slower than expected? If so, I can do some profiling.
Are faster times expected on x86/32 systems? (Possibly resulting in more rectangles accumulating.)
Attachment #361962 -
Flags: review?(mozbugz)
| Assignee | ||
Comment 31•16 years ago
|
||
Those times are consistent with what I'm seeing, but not consistent with what Boris and Moh see.
If GDK doesn't guarantee to cap the complexity of the update region, I'd actually still like to take the patch; it's simple enough.
Comment 32•16 years ago
|
||
(In reply to comment #31)
> If GDK doesn't guarantee to cap the complexity of the update region, I'd
> actually still like to take the patch;
The GDK code doesn't look like it applies any limit on the complexity.
> it's simple enough.
This is a little simpler. We don't need to use gdk_region_get_clipbox because the bounding box is already available in the GdkEventExpose.
I assume that, if we expect to receive a region with a large number of rects, then we don't want to require a complex clip from cairo?
ctx->NewPath();
for (r = rects; r < r_end; ++r) {
ctx->Rectangle(gfxRect(r->x, r->y, r->width, r->height));
}
ctx->Clip();
This patch clips only to the bounding box when the number of rects in the region is large.
Attachment #361962 -
Attachment is obsolete: true
Attachment #362173 -
Flags: review?(roc)
| Assignee | ||
Comment 33•16 years ago
|
||
Yeah, we don't want a complex clip rect in cairo.
| Assignee | ||
Updated•16 years ago
|
Attachment #362173 -
Flags: review?(roc) → review+
Attachment #361466 -
Flags: review?(joshmoz) → review+
| Assignee | ||
Updated•16 years ago
|
Whiteboard: [needs review] → [needs landing]
| Assignee | ||
Comment 34•16 years ago
|
||
Pushed
http://hg.mozilla.org/mozilla-central/rev/98f8a3ecddb6
http://hg.mozilla.org/mozilla-central/rev/c9ea013c5552
http://hg.mozilla.org/mozilla-central/rev/b470ee776528
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Whiteboard: [needs landing] → [needs 191 approval]
| Assignee | ||
Updated•16 years ago
|
Attachment #361466 -
Flags: approval1.9.1+
| Assignee | ||
Comment 35•16 years ago
|
||
Comment on attachment 361466 [details] [diff] [review]
fix?
I'm not sure I should be approving my own patches, but this is simple enough...
| Assignee | ||
Comment 36•16 years ago
|
||
Comment on attachment 361927 [details] [diff] [review]
Windows fix?
I'm not sure I should be approving my own patches, but this is simple enough...
Attachment #361927 -
Flags: approval1.9.1+
| Assignee | ||
Updated•16 years ago
|
Attachment #362173 -
Flags: approval1.9.1+
| Assignee | ||
Updated•16 years ago
|
Whiteboard: [needs 191 approval] → [needs 191 landing]
Comment 37•16 years ago
|
||
Is the test case supposed to be "faster" after the patch? It's not for me if that is supposed to be the case, actually looks to be a second or two longer.
BUILD: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090216 Minefield/3.2a1pre (.NET CLR 3.5.30729) ID:20090216011536
~B
| Assignee | ||
Comment 38•16 years ago
|
||
I would be very surprised if it made the test any slower.
But I'm not surprised if it doesn't make the test faster, for you. It didn't have any effect for me. But for me the test was already pretty fast. See comments #20, #22 and #30.
Comment 39•16 years ago
|
||
could this have caused bug 479904?
| Assignee | ||
Comment 40•16 years ago
|
||
No (tested via backout).
| Assignee | ||
Comment 41•16 years ago
|
||
Updated•7 years ago
|
Component: Layout: View Rendering → Layout: Web Painting
You need to log in
before you can comment on or make changes to this bug.
Description
•