Closed
      
        Bug 1072150
      
      
        Opened 11 years ago
          Closed 10 years ago
      
        
    
  
Assert against accessing the subject principal when no AutoJSAPI is on the stack 
    Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
        RESOLVED
        FIXED
        
    
  
        
            mozilla44
        
    
  
| Tracking | Status | |
|---|---|---|
| firefox44 | --- | fixed | 
People
(Reporter: bholley, Assigned: bholley)
References
(Depends on 1 open bug)
Details
(Keywords: sec-other, Whiteboard: [post-critsmash-triage][adv-main44-])
Attachments
(4 files)
| 871 bytes,
          patch         | bzbarsky
:
              
              review+ | Details | Diff | Splinter Review | 
| 3.29 KB,
          patch         | bzbarsky
:
              
              review+ | Details | Diff | Splinter Review | 
| 43.19 KB,
          patch         | bzbarsky
:
              
              review+ | Details | Diff | Splinter Review | 
| 1.08 KB,
          patch         | bzbarsky
:
              
              review+ | Details | Diff | Splinter Review | 
Our current architecture for computing the subject principal relies in inspecting the current compartment of JS execution (and assuming System Principal if there is none). This generally works well, but is easy to screw up in the case where code dispatches async runnables to continue its work.
One simple idea I had is for the nsRunnable constructor to detect whether an AutoJSAPI is active at the time the runnable is constructed. If so, the caller is required to either pass the runnable an nsIGlobalObject (whose compartment it will enter before invoking Run()) or explicitly invoke something like AllowDispatchWithoutSecurityToken(). If neither of these things happens before the runnable is dispatched, we assert.
Thoughts? We probably couldn't do this for a few months until we've fully switched everything over to AutoJSAPI, but that shouldn't be too far off.
| Comment 1•11 years ago
           | ||
nsRunnable is often used for cross-thread communication too, so please don't change that.
Perhaps we need nsSameThreadRunnable or such.
But anyhow, do we actually have any case where this is an issue.
Currently runnables behave like "native code running, no js on the stack" and changing that
would break stuff, I'm pretty sure.
| Assignee | ||
| Comment 2•11 years ago
           | ||
(In reply to Olli Pettay [:smaug] from comment #1)
> nsRunnable is often used for cross-thread communication too, so please don't
> change that.
It would still be useful though, right? In the worker/main-thread case, we would pass the nsIGlobalObject (or similar) that would be used on the other thread. And when dispatching to threads where it doesn't make sense, we just use the explicit opt-out.
> But anyhow, do we actually have any case where this is an issue.
This is the biggest complaint I've heard from critics of IsCallerChrome() (i.e. bent and jonas), so I'm assuming that it is. Ben/Jonas, can you confirm?
> Currently runnables behave like "native code running, no js on the stack"
> and changing that
> would break stuff, I'm pretty sure.
Well, this patch would just require us to audit the cases that might be problematic, and make the right decision in each one.
|   | ||
| Comment 3•11 years ago
           | ||
I'd really like to understand the set of problems we're trying to solve.
From my point of view, introspection of the subject principal anywhere other than the very entry point of algorithms seems like a bug to me...
| Comment 4•11 years ago
           | ||
(In reply to Bobby Holley (:bholley) from comment #2)
> It would still be useful though, right? In the worker/main-thread case, we
> would pass the nsIGlobalObject (or similar) that would be used on the other
> thread.
And crash immediately. global objects aren't thread safe, they are something cyclecollactable
> 
> Well, this patch would just require us to audit the cases that might be
> problematic, and make the right decision in each one.
We have tons of nsRunnable usages. (including all the runnable methods etc.)
|   | ||
| Comment 5•11 years ago
           | ||
Assuming there are less stray subject principal checks than nsRunnables, what if instead you asserted, whenever subject principal was checked, that *some* Auto* guard had been entered since the event loop.  The idea being that you are asserting that someone higher up on the callstack had considered the question of what the current principal was.
| Comment 6•11 years ago
           | ||
Well, on the main thread we do http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/nsXPConnect.cpp?rev=e1f3be2c48f6&mark=1066-1066#1057
Everything expects currently that when runnable runs, it is treated like native code/chrome is running.
| Assignee | ||
| Comment 7•11 years ago
           | ||
(In reply to Boris Zbarsky [:bz] from comment #3)
> I'd really like to understand the set of problems we're trying to solve.
Maybe Jonas can weigh in here and describe the scenarios he's concerned about.
> From my point of view, introspection of the subject principal anywhere other
> than the very entry point of algorithms seems like a bug to me...
Does this mean that we should never be inspecting the subject principal when there isn't an active AutoJSAPI on the stack? That seems to contradict what smaug says below:
(In reply to Olli Pettay [:smaug] from comment #6)
> Everything expects currently that when runnable runs, it is treated like
> native code/chrome is running.
Would it in fact break the world if invoking SubjectPrincipal()/IsCallerChrome() without an AutoJSAPI on the stack caused a MOZ_CRASH? If so, then from Boris' perspective we have a lot of bugs.
|   | ||
| Comment 8•11 years ago
           | ||
> Does this mean that we should never be inspecting the subject principal when there isn't 
> an active AutoJSAPI on the stack?
Pretty much, yes.
> If so, then from Boris' perspective we have a lot of bugs.
Yes, I think we do.
| Assignee | ||
| Comment 9•11 years ago
           | ||
(In reply to Boris Zbarsky [:bz] from comment #8)
> > Does this mean that we should never be inspecting the subject principal when there isn't 
> > an active AutoJSAPI on the stack?
> 
> Pretty much, yes.
Ok. Then that's a very straightforward (if perhaps tedious) thing to start fixing.
| Assignee | ||
| Comment 10•11 years ago
           | ||
Basically, we just make SubjectPrincipal() MOZ_CRASH in those situations and fix things until the tree is green. That solve the runnable case and makes our code generally better to boot.
|   | ||
| Comment 11•11 years ago
           | ||
Maybe I should expand on my position, just to make it clear.
The fundamental issue we've run into over and over is that we will have some API, call it Foo(), that is called both from script and for our own internal purposes.  Then we try to tell these two cases apart by checking the subject principal, and requiring "internal purposes" callers to push a null JSContext or use an AutoNoJSAPI or whatever.
But the fundamental bug here, it seems to me, is the lack of differentiation between the "called by someone who may not have permissions to do this, check whether they do" case and the "just do this thing" case.
In the XPCOM world, we were basically forced into this, since there was only one possible API entrypoint: the XPCOM method (getter, setter, whatever).
But in the Web IDL world, we're not forced into doing this.  We can, and imo should, expose different methods for JS and C++ to call into.  We can also hoist checks into bindings code as needed, effectively making the binding method the "JS method that does the security check" and the C++ impl as the "Just do it" method.  We've done his very successfully with [ChromeOnly], for example, which obviates the need for a huge number of IsCallerChrome() checks in the C++.
Maybe I'm missing something, of course, in which case I'd like people to tell me what that is.  But it seems to me like my proposal would eliminate the spooky action at a distance issues we get when you call some code that calls some other code that then happens to do a security check because it can happen to get invoked directly from JS, and suddenly you're screwed.
| Assignee | ||
| Comment 12•11 years ago
           | ||
bz++. Does the action plan in comment 10 sound good?
|   | ||
| Comment 13•11 years ago
           | ||
The plan of comment 10 sounds pretty good, yeah.
| Assignee | ||
| Updated•11 years ago
           | 
Summary: Add stronger enforcement for inheriting security characteristics in async runnables → Assert against accessing the subject principal when no AutoJSAPI is on the stack
| Assignee | ||
| Comment 14•11 years ago
           | ||
| Assignee | ||
| Comment 15•11 years ago
           | ||
| Assignee | ||
| Comment 16•11 years ago
           | ||
| Assignee | ||
| Comment 17•11 years ago
           | ||
| Assignee | ||
| Comment 18•11 years ago
           | ||
| Assignee | ||
| Comment 19•11 years ago
           | ||
(In reply to Boris Zbarsky [:bz] from comment #3)
> From my point of view, introspection of the subject principal anywhere other
> than the very entry point of algorithms seems like a bug to me...
I agree with this. Though it's not clear to me what "algorithms" mean here exactly.
I would say "APIs" rather than "algorithms".
One tricky scenario is when one API/algorithm calls another. For example API/algorithm A might be implementing part of its functionality by calling API/algorithm B.
That means that while it's "the very entry point" of B, it's in the middle of A. Which per bz's comment above would be a bad idea (which I agree with).
So does that mean that A shouldn't be calling B? And that we should break out B's implementation into some function that can be called by both A and B? That way B can do its subject principal check before calling that function. 
This is why I think explicitly passing something representing the subject principal is a good idea, rather than grabbing it from a global. At least we can see in the code that some function that's about to use the subject principal is called. And we can either make sure that the correct subject principal is used, or that the code needs to be refactored so that subject principal checks only do happen at "the very entry point".
|   | ||
| Comment 21•11 years ago
           | ||
> One tricky scenario is when one API/algorithm calls another.
I would argue this sort of thing is a bug, unless a spec explicitly says to do that.
> And that we should break out B's implementation into some function that can be called by
> both A and B?
If we're just using B for some sort of internal stuff when called from A, yes.
| Assignee | ||
| Comment 22•11 years ago
           | ||
| Assignee | ||
| Comment 23•11 years ago
           | ||
| Assignee | ||
| Comment 24•11 years ago
           | ||
| Assignee | ||
| Comment 25•11 years ago
           | ||
| Assignee | ||
| Comment 26•11 years ago
           | ||
I guess posting try pushes is probably pretty annoying given the encrypted bugmail, so I'll stop posting them until I get one that's green.
| Updated•10 years ago
           | 
Group: core-security → dom-core-security
| Assignee | ||
| Comment 27•10 years ago
           | ||
I can't believe we do this.
        Attachment #8665136 -
        Flags: review?(bzbarsky)
| Assignee | ||
| Comment 28•10 years ago
           | ||
        Attachment #8665137 -
        Flags: review?(bzbarsky)
| Assignee | ||
| Comment 29•10 years ago
           | ||
        Attachment #8665138 -
        Flags: review?(bzbarsky)
| Assignee | ||
| Comment 30•10 years ago
           | ||
        Attachment #8665139 -
        Flags: review?(bzbarsky)
| Assignee | ||
| Comment 31•10 years ago
           | ||
| Assignee | ||
| Comment 32•10 years ago
           | ||
For the curious, I was inspired to finish this up by bug 1201747, which this would have caught.
|   | ||
| Comment 33•10 years ago
           | ||
Comment on attachment 8665136 [details] [diff] [review]
Part 1 - Don't examine the subject principal in CheckSameOrigin. v1
Hmm.  Looks like nowadays the only caller other than XUL stuff is treewalker.
I guess we might not have extensions or chrome changing which document a treewalker is walking.. but watch out for regressions.  :(
r=me
        Attachment #8665136 -
        Flags: review?(bzbarsky) → review+
|   | ||
| Comment 34•10 years ago
           | ||
Comment on attachment 8665137 [details] [diff] [review]
Part 2 - Introduce a transitional legacy API that works like things used to. v1
>+  // (whose access needed to be checked) and internal C++ platform code (whose
s/and internal/or internal/
r=nicecomments
        Attachment #8665137 -
        Flags: review?(bzbarsky) → review+
|   | ||
| Comment 35•10 years ago
           | ||
Comment on attachment 8665138 [details] [diff] [review]
Part 3 - Use the opt-out for various sloppy consumers. v1
r=me, but how do we envision things like DispatchDOMEvent working in the long term?
Also, we can probably remove a bunch of those checks in WindowUtils, in followups.
        Attachment #8665138 -
        Flags: review?(bzbarsky) → review+
|   | ||
| Comment 36•10 years ago
           | ||
Comment on attachment 8665139 [details] [diff] [review]
Part 4 - Crash in SubjectPrincipal if there's no active AutoJSAPI. v1
r=me, but we really should get followups for all that Legacy* stuff.
        Attachment #8665139 -
        Flags: review?(bzbarsky) → review+
| Assignee | ||
| Comment 37•10 years ago
           | ||
(In reply to Boris Zbarsky [:bz] from comment #35)
> Comment on attachment 8665138 [details] [diff] [review]
> Part 3 - Use the opt-out for various sloppy consumers. v1
> 
> r=me, but how do we envision things like DispatchDOMEvent working in the
> long term?
Naively, by having a separate API entry point for the scripted case, and doing the security check there. If we need to leave a couple of these in for a while, I don't think it costs us very much.
> Also, we can probably remove a bunch of those checks in WindowUtils, in
> followups.
Yeah, though I'd kinda rather wait until we turn off XPConnect for the web.
| Assignee | ||
| Updated•10 years ago
           | 
Assignee: nobody → bobbyholley
|   | ||
| Comment 38•10 years ago
           | ||
Sounds like a plan on both counts.
| Assignee | ||
| Comment 39•10 years ago
           | ||
I filed bug 1208164 for the followups.
| Comment 40•10 years ago
           | ||
Comment on attachment 8665137 [details] [diff] [review]
Part 2 - Introduce a transitional legacy API that works like things used to. v1
Review of attachment 8665137 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/base/nsContentUtils.h
@@ +224,5 @@
> +  // To enforce this and catch bugs, nsContentUtils::SubjectPrincipal will crash
> +  // if it is invoked without script on the stack. To land that transition, it
> +  // was necessary to go through and whitelist a bunch of callers that were
> +  // depending on the old behavior. Those callers should be fixed up, and these
> +  // methods should not be used by new code without review from bholley or bz.
Should this be "a Privilege Manager peer" or something similar rather than mention the two of you by name? ;)
| Assignee | ||
| Comment 41•10 years ago
           | ||
(In reply to Andrew McCreight [:mccr8] from comment #40)
> Comment on attachment 8665137 [details] [diff] [review]
> Part 2 - Introduce a transitional legacy API that works like things used to.
> v1
> 
> Review of attachment 8665137 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/base/nsContentUtils.h
> @@ +224,5 @@
> > +  // To enforce this and catch bugs, nsContentUtils::SubjectPrincipal will crash
> > +  // if it is invoked without script on the stack. To land that transition, it
> > +  // was necessary to go through and whitelist a bunch of callers that were
> > +  // depending on the old behavior. Those callers should be fixed up, and these
> > +  // methods should not be used by new code without review from bholley or bz.
> 
> Should this be "a Privilege Manager peer" or something similar rather than
> mention the two of you by name? ;)
Maybe, but I think in practice this is probably easier for people to figure out who to actually flag than a hypothetical module made up of the people who are paying attention to this issue. ;-)
I'm hoping that this will be a relatively temporary setup, so hopefully it won't be too long-lived.
| Assignee | ||
| Comment 42•10 years ago
           | ||
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/a99bc6984c86
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/7f547de6c16c
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/4d962ac45aee
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/f23c63830ae5
| Comment 43•10 years ago
           | ||
(In reply to Bobby Holley (:bholley) from comment #41)
> I'm hoping that this will be a relatively temporary setup, so hopefully it
> won't be too long-lived.
Oh, good point. I wasn't think about that aspect of it.
|   | ||
| Comment 44•10 years ago
           | ||
https://hg.mozilla.org/mozilla-central/rev/a99bc6984c86
https://hg.mozilla.org/mozilla-central/rev/7f547de6c16c
https://hg.mozilla.org/mozilla-central/rev/4d962ac45aee
https://hg.mozilla.org/mozilla-central/rev/f23c63830ae5
Status: NEW → RESOLVED
Closed: 10 years ago
          status-firefox44:
          --- → fixed
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
| Comment 45•10 years ago
           | ||
nsContentUtils::IsCallerChrome() is now rather broken, I guess on purpose, but this bug changed its meaning from what it has meant for years, which has obviously lead to tons of MOZ_CRASH crashes.
I'm about to back https://hg.mozilla.org/mozilla-central/rev/f23c63830ae5 out so that Nightlies will be useable again.
The crashes are rather obvious, so I don't think the MOZ_CRASH is needed there atm.
| Comment 46•10 years ago
           | ||
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Comment 47•10 years ago
           | ||
| Assignee | ||
| Comment 48•10 years ago
           | ||
(In reply to Olli Pettay [:smaug] from comment #45)
> nsContentUtils::IsCallerChrome() is now rather broken, I guess on purpose,
> but this bug changed its meaning from what it has meant for years, which has
> obviously lead to tons of MOZ_CRASH crashes.
Yes, that's how we find and annotate the callsites that need to be fixed.
It would have been more helpful to land/merge the crash fixes in the bugs (bug 1208517, bug 1208622, and bug 1208815), but oh well.
When those bugs hit central, I'll repush this patch.
Flags: needinfo?(bobbyholley)
| Comment 49•10 years ago
           | ||
(In reply to Bobby Holley (:bholley) from comment #48)
> It would have been more helpful to land/merge the crash fixes in the bugs
> (bug 1208517, bug 1208622, and bug 1208815), but oh well.
Sure, but I just wanted to get the Nightlies non-crash-y asap and didn't have time to go through
all the IsCallerChrome() callers.
|   | ||
| Comment 50•10 years ago
           | ||
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
|   | ||
| Comment 51•10 years ago
           | ||
Merging a backout shouldn't re-close the bug... ;)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
| Updated•10 years ago
           | 
Flags: needinfo?(bobbyholley)
| Assignee | ||
| Updated•10 years ago
           | 
Flags: needinfo?(bobbyholley)
| Assignee | ||
| Updated•10 years ago
           | 
Flags: needinfo?(bobbyholley)
| Assignee | ||
| Comment 52•10 years ago
           | ||
With the various fixes landed, Telemetry shows this case being hit in .1% of active runs. I think that's good enough to re-land. I'll  watch crash-stacks and fix any stragglers.
Flags: needinfo?(bobbyholley)
| Assignee | ||
| Comment 53•10 years ago
           | ||
| Assignee | ||
| Comment 54•10 years ago
           | ||
Note that I left the telemetry in and made this crash |#ifndef RELEASE_BUILD|, so that we can avoid any last-minute panics on release-day. I'll let the Telemetry ride one train ahead of the crash.
|   | ||
| Comment 55•10 years ago
           | ||
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
| Updated•10 years ago
           | 
Group: dom-core-security → core-security-release
|   | ||
| Updated•9 years ago
           | 
Whiteboard: [post-critsmash-triage]
| Updated•9 years ago
           | 
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main44-]
Depends on: 1275003
| Updated•9 years ago
           | 
Group: core-security-release
| Updated•6 years ago
           | 
Component: DOM → DOM: Core & HTML
          You need to log in
          before you can comment on or make changes to this bug.
        
Description
•