Closed Bug 475343 Opened 16 years ago Closed 16 years ago

Region maintenance can get very expensive

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: roc)

Details

(Keywords: fixed1.9.1, perf)

Attachments

(5 files, 1 obsolete file)

Transparent gif for testcase 16 years ago Boris Zbarsky [:bzbarsky] 49 bytes, image/gif		Details
Testcase 16 years ago Boris Zbarsky [:bzbarsky] 4.06 KB, text/html		Details
fix? 16 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 2.48 KB, patch	jaas : review+ roc : approval1.9.1+	Details \| Diff \| Splinter Review
Windows fix? 16 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 2.00 KB, patch	vlad : review+ roc : approval1.9.1+	Details \| Diff \| Splinter Review
fix 16 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 2.09 KB, patch		Details \| Diff \| Splinter Review
GTK precautions 16 years ago Karl Tomlinson (:karlt) 1.60 KB, patch	roc : review+ roc : approval1.9.1+	Details \| Diff \| Splinter Review

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

16 years ago

Quoting from email: ------------------------------------------------------------------------ It seems Firefox is unusually slow compared to Chrome on the following JavaScript Mandelbrot Set: http://warp.povusers.org/MandScripts/javascript2.html (click on Proceed in the bottom) The problem, however, is not in JavaScript component (less than 1% of the cycles is in js3250.dll). Following is the VTune data about retired instructions. A huge 80% of the total cycles is spent in gkgfx, and 85% of gkgfx is in nsRegion.cpp. A lot of rectangle insertions, deletions, and traversals are happening. ------------------------------------------------------------------------ ------------------------------------------------------------------------ It looks like we spend (on my hardware) about 1300ms getting to close(). Opera takes 1500, Safari 400. After that, though, we have a multi-second unresponsive hang, which is presumably what Moh had profiled. ccing roc and vlad, since this stuff is their kettle of fish. It looks like the images load, one by one, and invalidate their rects. There are 10,000 such rects, of course. So we basically end up under [ChildView drawRect:] called from NSView code, and from there spend our time in a 2-1 ratio under nsViewManager::Refresh and nsThebesRegion::Union. The latter is certainly the building up of the region under drawRect. The former is largely the nsRegion::Or call from ConvertNativeRegionToAppRegion,with some smattering of actually painting. The time spent painting is almost entirely under nsDisplaClip::OptimizeVisibility doing region operations. In other words, we spend more time sorting out the right region to paint that it would take to just paint everything. That reminds me strongly of bug 253001. Would it make sense to apply a similar approach here? It looks like the opaque region thing got backed out as part of the display list landing, because all that code went away. But we should probably do it for both damage and opaque regions. ------------------------------------------------------------------------

Flags: wanted1.9.2?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 1

•

16 years ago

Attached image Transparent gif for testcase — Details

Boris Zbarsky [:bzbarsky]

Reporter

Comment 2

•

16 years ago

Attached file Testcase — Details

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

16 years ago

Keywords: perf

Brendan Eich [:brendan]

Comment 3

•

16 years ago

Is there any chance of a 1.9.1 fix? /be

Flags: wanted1.9.1?

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 4

•

16 years ago

Yeah.

Assignee: nobody → roc

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

16 years ago

Flags: wanted1.9.1? → wanted1.9.1+

Josh Aas

Updated

•

16 years ago

Priority: -- → P2

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 5

•

16 years ago

For some reason my profiles look nothing like what bz described, although I do get a region with 30,000 rects in it.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 6

•

16 years ago

We use SimpleSubtract in nsDisplayItem::OptimizeVisibility, so we might burn some cycles there unnecessarily but it shouldn't be too bad.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 7

•

16 years ago

Attached patch fix? — Details — Splinter Review

This patch should take the heat off the spots that Boris was talking about. It doesn't seem to make any difference to me, which isn't surprising since I don't see those as hotspots.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 8

•

16 years ago

Boris, was the second section of comment #0 your own testing? If so, maybe you could try this patch.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 9

•

16 years ago

Yes, the second section was my own testing. And yes, this patch makes the beachballing I observed better (though we still seem to have a tad of lag there; not sure where; it's noticeably less snappy than Safari or Opera). Just out of curiosity, what _did_ your profiles look like?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 10

•

16 years ago

Profiling the part after the DOM is constructed with the patch, the time seems to be spent under nsIFrame::Invalidate. In particular, we call nsChildView::Invalidate, which calls [NSView setNeedsDisplayInRect;] which calls [NSRegion addRect], which calls CGSUnionRegionWithRect, which calls shape_union_with_bounds, which calls shape_union. All the time is spent in shape_union, as far as I can see. setNeedsDisplayInRect and everything below it looks like it's part of AppKit or CoreGraphics, so if we're to do something about this we'd need to do it inside the nsChildView::Invalidate method, right?

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 11

•

16 years ago

My profiles showed a lot of time in frame construction and most of the rest of the time in inputstream callbacks, processing the image loads I presume. These profiles are in debug builds, which is somewhat bogus but should be OK here. I think I lied in comment #5. I got a display list with 30,000 items in it but I didn't prove that the region has thousands of rectangles. Are you on Tiger or Leopard? If you're on Tiger, I wonder if Leopard is doing some kind of throttling of the size of the update region. Fixing nsChildView::Invalidate to avoid a problem in Appkit would be tricky. We'd have to track our own idea of what the dirty rect is and bash Appkit's rect with a bounding-rect if it's getting too complicated. Ugh.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 12

•

16 years ago

What were you profiling? Your profile sounds like the profile of the script execution, whereas I was profiling the beachball that we get when the image loads come in... I'm on Leopard.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 13

•

16 years ago

What I see after I press the button is a delay, then the document is replaced by the Mandelbrot image overlaid with a grid which I presume is unloaded image alt-rendering, then there's another delay and the grid disappears, I presume because the images have loaded. I profiled both delay periods separately. Maybe I forgot to check deep enough inside the image load stuff, I'll have another go.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 14

•

16 years ago

Yeah, that second delay is what I was profiling. Over here it was 4 seconds or so (compared to 1 sec for the first delay).

Moh Haghighat

Comment 15

•

16 years ago

Also, note that "View -> Reload" is very slow. On Windows, the reload progress is observable almost line by line.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 16

•

16 years ago

I just reran the test. For me it's about 4 seconds for the initial delay and (feels like) less for the second delay. I profiled the second delay with manual stop/start. It might be better if we instrumented the test to automatically start and stop Shark. My profile of the second delay spends about 22% of time painting along two different paths, about 54% of the time in PresShell::WillPaint flushing stuff (30% ProcessPendingRestyles and 23.5% ProcessReflowCommands; I presume this is due to the alt-image stuff being replaced by the real image frames), and 23.5% of the time is in nsInputStreamPump::OnInputStreamReady. In the input stream pump, 15% is OnStopRequest and 8% is OnDataAvailable. The OnStopRequest cost seems to all be secure browser and progress listener work. In OnDataAvailable we end up spending about 6% in nsIFrame::Invalidate ... eventually we spend 1.5% in nsChildView::Invalidate. Focusing on painting we spend 37% of the time under nsCSSRendering::PaintBackground, 11% of the time in nsImageFrame::PaintImage, and the rest (i.e. about 50%) is mostly frittered away on display list stuff, which isn't too bad since this testcase is pessimal for display lists. Remember this is a debug build. I don't know if that explains why my profile is quite different from yours. I guess I'll try with an opt build at some point but it takes a while to make on, I'll need to do it over lunch or at night or something. Do you think it's worth going ahead with the patch I attached?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 17

•

16 years ago

Huh. That's totally different from what I saw... My profiling was in fact with an opt build. The attached patch does help me, so it might be worth doing. That said, it's a patch in Mac-specific code. The original performance issue was observed on Windows, I assume (c.f. comparison to Chrome and use of VTune to profile). Do we need similar fixes for our other widget impls?

Brendan Eich [:brendan]

Comment 18

•

16 years ago

It's rare to find DEBUG code faster than opt, but could that be happening here? roc, you love dogfooding debug builds but they are terrible profiler targets. Can you at least reproduce bz's results for opt? /be

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 19

•

16 years ago

We can do something similar for Windows, sure. I'll do a patch after I've had several hours to build in my Windows VM. Brendan: yeah, I'll do that later.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 20

•

16 years ago

OK, in my Mac opt build, changeset 784122b49e33 (no patches), the delay between displaying the Mandelbrot image covered by the grid of missing-image borders, and then displaying the final image, is less than a second.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 21

•

16 years ago

Maybe it's a network issue.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 22

•

16 years ago

In my Windows opt build with the same changeset, I also see less than a second's delay.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 23

•

16 years ago

Attached patch Windows fix? — Details — Splinter Review

This does for Windows what the other patch does for Mac. I'd be interested in hearing if it helps. Of course, since my builds are already pretty quick for some reason, it doesn't make any difference for me.

Moh Haghighat

Comment 24

•

16 years ago

roc, This is quite an improvement. Previously, on Win32, the distribution of sampled instructions was: Module %INST #INST ------------ ------ -------------- gkgfx.dll 80.52% 31,236,600,000 win32k.sys 6.37% 2,471,600,000 gklayout.dll 4.98% 1,931,200,000 xpcom_core.dll 1.75% 679,400,000 ntoskrnl.exe 1.07% 414,200,000 thebes.dll 0.77% 298,200,000 imglib2.dll 0.76% 293,600,000 ntdll.dll 0.71% 274,800,000 gkparser.dll 0.67% 258,200,000 hal.dll 0.53% 205,400,000 js3250.dll 0.51% 196,600,000 With your patch in Comment #23, the profile is: Module %INST #INST --------------- ----- ------------- gklayout.dll 39.15% 1,909,600,000 xpcom_core.dll 14.63% 713,800,000 ntdll.dll 7.06% 344,600,000 imglib2.dll 6.52% 318,200,000 thebes.dll 6.29% 306,800,000 gkparser.dll 5.29% 258,200,000 js3250.dll 4.39% 214,200,000 The contribution of gkgfx.dll to the exec time has dropped from 81% to 2%. Altogether, now it takes 2 secs to paint. The profile of hot functions in gklayout.dll currently is: Name %INST #INST ---------------------- ----- ----------- SelectorMatches 7.46% 142,400,000 EnumerateAllRules 1.69% 32,200,000 HasStateDependentStyle 1.59% 30,400,000 RuleProcessorData 1.34% 25,600,000 Equals 1.27% 24,200,000 HasFlag 1.25% 23,800,000 GetOffsetTo 1.18% 22,600,000 QueryInterface 1.01% 19,200,000

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 25

•

16 years ago

Comment on attachment 361466 [details] [diff] [review] fix? OK, sounds like we should take this...

Attachment #361466 - Flags: review?(joshmoz)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 26

•

16 years ago

Comment on attachment 361927 [details] [diff] [review] Windows fix? And this...

Attachment #361927 - Flags: review?(vladimir)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 27

•

16 years ago

Attached patch fix (obsolete) — Details — Splinter Review

I haven't even compiled this, but Karl, if you could build it and test it, that'd be great :-)

Attachment #361962 - Flags: review?(mozbugz)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

16 years ago

Whiteboard: [needs review]

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 28

•

16 years ago

Comment on attachment 361927 [details] [diff] [review] Windows fix? This looks fine, but why don't we put MAX_RECTS_IN_REGION in nsBaseWidget.h under a suitable name to avoid duplicating it in 3 different places? Or do you foresee us wanting it to be different on different platforms?

Attachment #361927 - Flags: review?(vladimir) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 29

•

16 years ago

I don't really care, but I don't see a reason why it has to be the same on each platform.

Karl Tomlinson (:karlt)

Comment 30

•

16 years ago

Comment on attachment 361962 [details] [diff] [review] fix Without this patch, the maximum number of rectangles that I'm seeing in the region from the expose event on x86_64/Linux is 14, so this patch is not going to make any difference here. After clicking proceed, I see a delay of 3 seconds before the grid becomes visible, then ~1.5 seconds before the grid disappears. Reloading the page also takes ~4.5 seconds. Are these times slower than expected? If so, I can do some profiling. Are faster times expected on x86/32 systems? (Possibly resulting in more rectangles accumulating.)

Attachment #361962 - Flags: review?(mozbugz)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 31

•

16 years ago

Those times are consistent with what I'm seeing, but not consistent with what Boris and Moh see. If GDK doesn't guarantee to cap the complexity of the update region, I'd actually still like to take the patch; it's simple enough.

Karl Tomlinson (:karlt)

Comment 32

•

16 years ago

Attached patch GTK precautions — Details — Splinter Review

(In reply to comment #31) > If GDK doesn't guarantee to cap the complexity of the update region, I'd > actually still like to take the patch; The GDK code doesn't look like it applies any limit on the complexity. > it's simple enough. This is a little simpler. We don't need to use gdk_region_get_clipbox because the bounding box is already available in the GdkEventExpose. I assume that, if we expect to receive a region with a large number of rects, then we don't want to require a complex clip from cairo? ctx->NewPath(); for (r = rects; r < r_end; ++r) { ctx->Rectangle(gfxRect(r->x, r->y, r->width, r->height)); } ctx->Clip(); This patch clips only to the bounding box when the number of rects in the region is large.

Attachment #361962 - Attachment is obsolete: true

Attachment #362173 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 33

•

16 years ago

Yeah, we don't want a complex clip rect in cairo.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

16 years ago

Attachment #362173 - Flags: review?(roc) → review+

Josh Aas

Updated

•

16 years ago

Attachment #361466 - Flags: review?(joshmoz) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

16 years ago

Whiteboard: [needs review] → [needs landing]

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 34

•

16 years ago

Pushed http://hg.mozilla.org/mozilla-central/rev/98f8a3ecddb6 http://hg.mozilla.org/mozilla-central/rev/c9ea013c5552 http://hg.mozilla.org/mozilla-central/rev/b470ee776528

Status: NEW → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Whiteboard: [needs landing] → [needs 191 approval]

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

16 years ago

Attachment #361466 - Flags: approval1.9.1+

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 35

•

16 years ago

Comment on attachment 361466 [details] [diff] [review] fix? I'm not sure I should be approving my own patches, but this is simple enough...

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 36

•

16 years ago

Comment on attachment 361927 [details] [diff] [review] Windows fix? I'm not sure I should be approving my own patches, but this is simple enough...

Attachment #361927 - Flags: approval1.9.1+

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

16 years ago

Attachment #362173 - Flags: approval1.9.1+

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

16 years ago

Whiteboard: [needs 191 approval] → [needs 191 landing]

Bryan

Comment 37

•

16 years ago

Is the test case supposed to be "faster" after the patch? It's not for me if that is supposed to be the case, actually looks to be a second or two longer. BUILD: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090216 Minefield/3.2a1pre (.NET CLR 3.5.30729) ID:20090216011536 ~B

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 38

•

16 years ago

I would be very surprised if it made the test any slower. But I'm not surprised if it doesn't make the test faster, for you. It didn't have any effect for me. But for me the test was already pretty fast. See comments #20, #22 and #30.

Dietrich Ayala (:dietrich)

Comment 39

•

16 years ago

could this have caused bug 479904?

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

16 years ago

Depends on: 479904

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

16 years ago

No longer depends on: 479904

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 40

•

16 years ago

No (tested via backout).

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 41

•

16 years ago

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/a5ad7b6f823d http://hg.mozilla.org/releases/mozilla-1.9.1/rev/839bb11c849e http://hg.mozilla.org/releases/mozilla-1.9.1/rev/52013b510dee

Flags: wanted1.9.2?

Keywords: fixed1.9.1

Whiteboard: [needs 191 landing]

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: Layout: View Rendering → Layout: Web Painting

You need to log in before you can comment on or make changes to this bug.