Closed
Bug 896228
Opened 12 years ago
Closed 12 years ago
mtransport/NSS failure transferring files over data channels from Windows->mac/Linux
Categories
(Core :: WebRTC: Networking, defect)
Tracking
()
RESOLVED
FIXED
mozilla25
People
(Reporter: jesup, Assigned: jesup)
References
Details
(Whiteboard: [WebRTC] [blocking-webrtc-])
Attachments
(7 files)
|
928.84 KB,
text/plain
|
Details | |
|
587.04 KB,
text/plain
|
Details | |
|
222.52 KB,
application/octet-stream
|
Details | |
|
136.81 KB,
application/octet-stream
|
Details | |
|
2.69 KB,
text/plain
|
Details | |
|
159.12 KB,
application/octet-stream
|
Details | |
|
936 bytes,
patch
|
ekr
:
review+
lsblakk
:
approval-mozilla-aurora+
lsblakk
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
May be the same as bug 842283
Originally reported by standard8. Basically, you start a call (Windows <-> Mac/Linux), then drop a (largish) file onto the window on the Windows end. The file will start to transfer and then stop.
I'll attach logs and SCTP pcap dumps (for some reason on my system they don't show direction, but generally after the start the Windows machine is sending DATA, and the linux machine is sending SACK).
To generate these internal SCTP wireshark traces from a logfile:
Install and build wireshark from svn
Capture logfile with NSPR_LOG_MODULES of datachannel:5,sctp:5
grep SCTP__PACKET logfile | text2pcap -n -D -l 248 -t "%H:%M:%S." - log.pcapng
Copy of email from Michael Tuexen:
On Jul 19, 2013, at 11:20 PM, Randell Jesup <rjesup@mozilla.com> wrote:
>
> <datachannel_hang_win2.log.gz><datachannel_hang_linux2.log.gz>
OK, I guess the problem is on the Windows side. The log says
I 17:18:46.478000 0000 13 88 13 89 96 46 4a 51 7c 9e 44 b2 03 00 00 10 d5 7f 40 24 00 01 c5 e0 00 00 00 00 # SCTP_PACKET
17464[343ce80]: Ok, Common input processing called, m:13EC1D08 iphlen:0 offset:12 length:28 stcb:0D8C4AB8
17464[343ce80]: stcb:0D8C4AB8 state:8
17464[343ce80]: sctp_process_control: iphlen=0, offset=12, length=28 stcb:0D8C4AB8
17464[343ce80]: sctp_process_control: processing a chunk type=3, len=16
17464[343ce80]: SCTP_SACK
17464[343ce80]: SCTP_SACK process cum_ack:d57f4024 num_seg:0 a_rwnd:116192
17464[343ce80]: Check for chunk output prw:0 tqe:1 tf=94656
17464[343ce80]: Couldn't send media on 'e4e576e9f61dd35d/stream2/video'
17464[343ce80]: Flow[e4e576e9f61dd35d:2,rtp(none)]; Layer[dtls]: NSS Error -12216
17464[343ce80]: Data transport state: 5
The processed SACK is the last packet being received. I guess that the the
consequence of the reported NSS error indicating (SSL_ERROR_SOCKET_WRITE_FAILURE)
is that no packets can received anymore. That would explain the behaviour.
This NSS error does not show up in the log of the Linux host.
So the question is why NSS reports a socket error... I need to look at the
NSS code. Do you have an idea?
| Assignee | ||
Comment 1•12 years ago
|
||
| Assignee | ||
Comment 2•12 years ago
|
||
| Assignee | ||
Comment 3•12 years ago
|
||
| Assignee | ||
Comment 4•12 years ago
|
||
Comment 5•12 years ago
|
||
I'm guessing the problem is partial writes due to the socket buffers
being partly full, and the comment below not being 100% accurate...
/* We should always have complete writes b/c datagram sockets
* don't really block */
if (ss->pendingBuf.len > 0) {
ssl_MapLowLevelError(SSL_ERROR_SOCKET_WRITE_FAILURE);
return SECFailure;
}
I would suggest putting a breakpoint in PORT_SetError() for this error
so we can get a stack trace. Or, alternately put a log statement in
TransportLayerNSPRAdaptor::Write() for there being a short write.
| Assignee | ||
Comment 6•12 years ago
|
||
Windows side (only) got an error -2 in ::Write(). I'll try to get a stack trace tomorrow.
Comment 7•12 years ago
|
||
(In reply to Eric Rescorla (:ekr) from comment #5)
> I'm guessing the problem is partial writes due to the socket buffers
> being partly full, and the comment below not being 100% accurate...
>
> /* We should always have complete writes b/c datagram sockets
> * don't really block */
> if (ss->pendingBuf.len > 0) {
> ssl_MapLowLevelError(SSL_ERROR_SOCKET_WRITE_FAILURE);
> return SECFailure;
> }
>
> I would suggest putting a breakpoint in PORT_SetError() for this error
> so we can get a stack trace. Or, alternately put a log statement in
> TransportLayerNSPRAdaptor::Write() for there being a short write.
I hope that Windows is not that broken... How would you signal packet boundaries
if the send() calls for UDP are not atomic? But you never know...
Comment 8•12 years ago
|
||
No, Windows is not that broken. Using this test program
on Windows I have no indication that UDP send are non-atomic.
Even incoming ICMP message (destination unreachable /
port unreachable) result in an error.
So we need to instrument the Firefox code. Something else is going on.
BTW: I can't reproduce the issue between a FF running on a Windows VM
on a Mac sending to a FF running on the same Mac.
Comment 9•12 years ago
|
||
(In reply to Michael Tüxen from comment #8)
> Created attachment 778966 [details]
> Test program for UDP sockets on Windows (sending side)
>
> No, Windows is not that broken. Using this test program
> on Windows I have no indication that UDP send are non-atomic.
> Even incoming ICMP message (destination unreachable /
> port unreachable) result in an error.
I meant: don't result in an error...
> So we need to instrument the Firefox code. Something else is going on.
> BTW: I can't reproduce the issue between a FF running on a Windows VM
> on a Mac sending to a FF running on the same Mac.
| Assignee | ||
Comment 10•12 years ago
|
||
I filtered out the RTP packets to make it easier to look at. Taken at the linux end - not the same attempt as the above ones. I'll upload the SCTP traces for this try later
| Assignee | ||
Comment 11•12 years ago
|
||
More info:
15588[3469e18]: Couldn't send media on '95314cfe55bfb02e/stream2/video', error 13
15588[3469e18]: Flow[95314cfe55bfb02e:2,rtp(none)]; Layer[ice]: SendPacket(1325) failed, res = 2152136707
15588[3469e18]: *** ::Write() error: -2
15588[3469e18]: Flow[95314cfe55bfb02e:2,rtp(none)]; Layer[dtls]: NSS Error -12216
15588[3469e18]: Data transport state: 5
| Assignee | ||
Comment 12•12 years ago
|
||
| Assignee | ||
Updated•12 years ago
|
Attachment #779052 -
Flags: review?(ekr)
| Assignee | ||
Comment 13•12 years ago
|
||
800K, 20MB and 425MB files work (425 MB takes a while to transfer.... and best to be on linux/x64 or Mac 64-bit browser, as it's all one blob). :-)
| Assignee | ||
Comment 14•12 years ago
|
||
Comment on attachment 779052 [details] [diff] [review]
in nicer, return WOULDBLOCK if NSPR SendTo() would block
[Approval Request Comment]
Preemptive request for uplift. ekr and I have already discussed the exact patch, but he had to go to sleep.
Bug caused by (feature/regressing bug #): N/A
User impact if declined: large (100's of K) transfers in DataChannels will hang on Windows. Significant break to the feature for a major usecase (file transfer). Current file xfer sites are working around it with manual throttling and JS re-framing into smaller packets.
Testing completed (on m-c, etc.): Locally tested (up to a 425 MB file).
Risk to taking this patch (and alternatives if risky): Virtually none. Patch is basically a 1-liner. It causes us to retry if we see a WOULD_BLOCK error from PR_SendTo instead of just failing.
String or IDL/UUID changes made by this patch: none
Attachment #779052 -
Flags: approval-mozilla-beta?
Attachment #779052 -
Flags: approval-mozilla-aurora?
| Assignee | ||
Updated•12 years ago
|
status-firefox22:
--- → wontfix
status-firefox23:
--- → affected
status-firefox24:
--- → affected
status-firefox25:
--- → affected
tracking-firefox23:
--- → ?
tracking-firefox24:
--- → ?
Comment 15•12 years ago
|
||
Comment on attachment 779052 [details] [diff] [review]
in nicer, return WOULDBLOCK if NSPR SendTo() would block
Review of attachment 779052 [details] [diff] [review]:
-----------------------------------------------------------------
lgtm
Attachment #779052 -
Flags: review?(ekr) → review+
| Assignee | ||
Comment 16•12 years ago
|
||
Target Milestone: --- → mozilla25
Updated•12 years ago
|
No longer blocks: Talkilla
tracking-firefox25:
--- → +
Comment 17•12 years ago
|
||
Will come back for uplift approval when this is on central - it can go into Thursday's beta.
Comment 18•12 years ago
|
||
Assignee: nobody → rjesup
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Whiteboard: [WebRTC] [blocking-webrtc-] → [WebRTC] [blocking-webrtc-][webrtc-uplift]
Updated•12 years ago
|
Attachment #779052 -
Flags: approval-mozilla-beta?
Attachment #779052 -
Flags: approval-mozilla-beta+
Attachment #779052 -
Flags: approval-mozilla-aurora?
Attachment #779052 -
Flags: approval-mozilla-aurora+
| Assignee | ||
Comment 19•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/ddfc4afdb7ce
https://hg.mozilla.org/releases/mozilla-beta/rev/64065ae22209
Whiteboard: [WebRTC] [blocking-webrtc-][webrtc-uplift] → [WebRTC] [blocking-webrtc-]
Comment 21•12 years ago
|
||
tested it now with on version 23, looks to be indeed fixed.
Comment 22•12 years ago
|
||
Can someone please provide a link to a test page which allows both video/audio communication and file transfer?
We tried by dragging a file over a WebRTC call window but it opened the file in FF and ended the call.
| Assignee | ||
Comment 23•12 years ago
|
||
This was originally tested with Talkilla IIRC, and also seen with sharefest.me (however, as that's a production site it's unlikely that the exact code originally used in the dup of this bug is still active). Even the recent externl verification might have involved code changes relative to the public site.
File transfer on drop is a behavior of the JS app. Sharefest has no calls per se, it's only for transferring data.
You need to log in
before you can comment on or make changes to this bug.
Description
•