Closed Bug 896228 Opened 12 years ago Closed 12 years ago

mtransport/NSS failure transferring files over data channels from Windows->mac/Linux

Categories

(Core :: WebRTC: Networking, defect)

22 Branch
x86_64
Windows 7
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla25
Tracking Status
firefox22 --- wontfix
firefox23 + fixed
firefox24 + fixed
firefox25 + fixed

People

(Reporter: jesup, Assigned: jesup)

References

Details

(Whiteboard: [WebRTC] [blocking-webrtc-])

Attachments

(7 files)

May be the same as bug 842283 Originally reported by standard8. Basically, you start a call (Windows <-> Mac/Linux), then drop a (largish) file onto the window on the Windows end. The file will start to transfer and then stop. I'll attach logs and SCTP pcap dumps (for some reason on my system they don't show direction, but generally after the start the Windows machine is sending DATA, and the linux machine is sending SACK). To generate these internal SCTP wireshark traces from a logfile: Install and build wireshark from svn Capture logfile with NSPR_LOG_MODULES of datachannel:5,sctp:5 grep SCTP__PACKET logfile | text2pcap -n -D -l 248 -t "%H:%M:%S." - log.pcapng Copy of email from Michael Tuexen: On Jul 19, 2013, at 11:20 PM, Randell Jesup <rjesup@mozilla.com> wrote: > > <datachannel_hang_win2.log.gz><datachannel_hang_linux2.log.gz> OK, I guess the problem is on the Windows side. The log says I 17:18:46.478000 0000 13 88 13 89 96 46 4a 51 7c 9e 44 b2 03 00 00 10 d5 7f 40 24 00 01 c5 e0 00 00 00 00 # SCTP_PACKET 17464[343ce80]: Ok, Common input processing called, m:13EC1D08 iphlen:0 offset:12 length:28 stcb:0D8C4AB8 17464[343ce80]: stcb:0D8C4AB8 state:8 17464[343ce80]: sctp_process_control: iphlen=0, offset=12, length=28 stcb:0D8C4AB8 17464[343ce80]: sctp_process_control: processing a chunk type=3, len=16 17464[343ce80]: SCTP_SACK 17464[343ce80]: SCTP_SACK process cum_ack:d57f4024 num_seg:0 a_rwnd:116192 17464[343ce80]: Check for chunk output prw:0 tqe:1 tf=94656 17464[343ce80]: Couldn't send media on 'e4e576e9f61dd35d/stream2/video' 17464[343ce80]: Flow[e4e576e9f61dd35d:2,rtp(none)]; Layer[dtls]: NSS Error -12216 17464[343ce80]: Data transport state: 5 The processed SACK is the last packet being received. I guess that the the consequence of the reported NSS error indicating (SSL_ERROR_SOCKET_WRITE_FAILURE) is that no packets can received anymore. That would explain the behaviour. This NSS error does not show up in the log of the Linux host. So the question is why NSS reports a socket error... I need to look at the NSS code. Do you have an idea?
I'm guessing the problem is partial writes due to the socket buffers being partly full, and the comment below not being 100% accurate... /* We should always have complete writes b/c datagram sockets * don't really block */ if (ss->pendingBuf.len > 0) { ssl_MapLowLevelError(SSL_ERROR_SOCKET_WRITE_FAILURE); return SECFailure; } I would suggest putting a breakpoint in PORT_SetError() for this error so we can get a stack trace. Or, alternately put a log statement in TransportLayerNSPRAdaptor::Write() for there being a short write.
Windows side (only) got an error -2 in ::Write(). I'll try to get a stack trace tomorrow.
(In reply to Eric Rescorla (:ekr) from comment #5) > I'm guessing the problem is partial writes due to the socket buffers > being partly full, and the comment below not being 100% accurate... > > /* We should always have complete writes b/c datagram sockets > * don't really block */ > if (ss->pendingBuf.len > 0) { > ssl_MapLowLevelError(SSL_ERROR_SOCKET_WRITE_FAILURE); > return SECFailure; > } > > I would suggest putting a breakpoint in PORT_SetError() for this error > so we can get a stack trace. Or, alternately put a log statement in > TransportLayerNSPRAdaptor::Write() for there being a short write. I hope that Windows is not that broken... How would you signal packet boundaries if the send() calls for UDP are not atomic? But you never know...
No, Windows is not that broken. Using this test program on Windows I have no indication that UDP send are non-atomic. Even incoming ICMP message (destination unreachable / port unreachable) result in an error. So we need to instrument the Firefox code. Something else is going on. BTW: I can't reproduce the issue between a FF running on a Windows VM on a Mac sending to a FF running on the same Mac.
(In reply to Michael Tüxen from comment #8) > Created attachment 778966 [details] > Test program for UDP sockets on Windows (sending side) > > No, Windows is not that broken. Using this test program > on Windows I have no indication that UDP send are non-atomic. > Even incoming ICMP message (destination unreachable / > port unreachable) result in an error. I meant: don't result in an error... > So we need to instrument the Firefox code. Something else is going on. > BTW: I can't reproduce the issue between a FF running on a Windows VM > on a Mac sending to a FF running on the same Mac.
I filtered out the RTP packets to make it easier to look at. Taken at the linux end - not the same attempt as the above ones. I'll upload the SCTP traces for this try later
More info: 15588[3469e18]: Couldn't send media on '95314cfe55bfb02e/stream2/video', error 13 15588[3469e18]: Flow[95314cfe55bfb02e:2,rtp(none)]; Layer[ice]: SendPacket(1325) failed, res = 2152136707 15588[3469e18]: *** ::Write() error: -2 15588[3469e18]: Flow[95314cfe55bfb02e:2,rtp(none)]; Layer[dtls]: NSS Error -12216 15588[3469e18]: Data transport state: 5
Attachment #779052 - Flags: review?(ekr)
800K, 20MB and 425MB files work (425 MB takes a while to transfer.... and best to be on linux/x64 or Mac 64-bit browser, as it's all one blob). :-)
Comment on attachment 779052 [details] [diff] [review] in nicer, return WOULDBLOCK if NSPR SendTo() would block [Approval Request Comment] Preemptive request for uplift. ekr and I have already discussed the exact patch, but he had to go to sleep. Bug caused by (feature/regressing bug #): N/A User impact if declined: large (100's of K) transfers in DataChannels will hang on Windows. Significant break to the feature for a major usecase (file transfer). Current file xfer sites are working around it with manual throttling and JS re-framing into smaller packets. Testing completed (on m-c, etc.): Locally tested (up to a 425 MB file). Risk to taking this patch (and alternatives if risky): Virtually none. Patch is basically a 1-liner. It causes us to retry if we see a WOULD_BLOCK error from PR_SendTo instead of just failing. String or IDL/UUID changes made by this patch: none
Attachment #779052 - Flags: approval-mozilla-beta?
Attachment #779052 - Flags: approval-mozilla-aurora?
Comment on attachment 779052 [details] [diff] [review] in nicer, return WOULDBLOCK if NSPR SendTo() would block Review of attachment 779052 [details] [diff] [review]: ----------------------------------------------------------------- lgtm
Attachment #779052 - Flags: review?(ekr) → review+
Blocks: Talkilla
Will come back for uplift approval when this is on central - it can go into Thursday's beta.
Blocks: Talkilla
Assignee: nobody → rjesup
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [WebRTC] [blocking-webrtc-] → [WebRTC] [blocking-webrtc-][webrtc-uplift]
Attachment #779052 - Flags: approval-mozilla-beta?
Attachment #779052 - Flags: approval-mozilla-beta+
Attachment #779052 - Flags: approval-mozilla-aurora?
Attachment #779052 - Flags: approval-mozilla-aurora+
tested it now with on version 23, looks to be indeed fixed.
Can someone please provide a link to a test page which allows both video/audio communication and file transfer? We tried by dragging a file over a WebRTC call window but it opened the file in FF and ended the call.
This was originally tested with Talkilla IIRC, and also seen with sharefest.me (however, as that's a production site it's unlikely that the exact code originally used in the dup of this bug is still active). Even the recent externl verification might have involved code changes relative to the public site. File transfer on drop is a behavior of the JS app. Sharefest has no calls per se, it's only for transferring data.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: