Open Bug 1914407 Opened 11 months ago Updated 25 days ago

Investigate navigation errors when navigation gets redirected or interrupted

Categories

(Remote Protocol :: WebDriver BiDi, task, P2)

task
Points:
3

Tracking

(Not tracked)

People

(Reporter: whimboo, Unassigned)

References

(Depends on 1 open bug, Blocks 4 open bugs)

Details

(Whiteboard: [webdriver:m17], [wptsync upstream])

Attachments

(4 files, 2 obsolete files)

Originally filed at: https://github.com/puppeteer/puppeteer/issues/12929

We should check what exactly is causing us to fail the navigation command.

Attached file BCWebProgress:5 log

The problem here seems to be the redirect to https://twitter.com. Not sure how this is done but it looks like it's not a usual redirect, but maybe they manually call window.location as well?

Actually using DevTools network monitor the raw response from loading https://x.com shows indeed that window.location.href is used but there is also a call to replaceState. Not sure which exactly is executed here.

This bug seems to rely on the same underlying discussion as bug 1914405.

Severity: -- → S3
Points: --- → 3
Depends on: 1914405
Priority: -- → P2
Whiteboard: [webdriver:m12]
Blocks: 1914405
No longer depends on: 1914405
Assignee: nobody → jdescottes
Status: NEW → ASSIGNED

I tried to investigate a bit why nsIRequest.canceledReason didn't give any extra info here.
When the load is interrupted by a window.location update, as pointed out by :Nika, the docshell will stop the load at https://searchfox.org/mozilla-central/rev/cc01f11adfacca9cd44a75fd140d2fdd8f9a48d4/docshell/base/nsDocShell.cpp#9388-9401

// Stop any current network activity.
// Also stop content if this is a zombie doc. otherwise
// the onload will be delayed by other loads initiated in the
// background by the first document that
// didn't fully load before the next load was initiated.
// If not a zombie, don't stop content until data
// starts arriving from the new URI...

if ((mDocumentViewer && mDocumentViewer->GetPreviousViewer()) ||
    LOAD_TYPE_HAS_FLAGS(aLoadState->LoadType(), LOAD_FLAGS_STOP_CONTENT)) {
  rv = Stop(nsIWebNavigation::STOP_ALL);
} else {
  rv = Stop(nsIWebNavigation::STOP_NETWORK);
}

And interestingly, we actually cancel the requests with a proper reason "nsDocLoader::Stop". While this wouldn't be enough to differentiate between a stop due to a navigation and one due to the user canceling the navigation (eg clicking on the cancel icon or pressing escape), it would still be interesting to have this reason available to our Navigation helper.

However the request available from the WebProgress in the parent process is a RemoteWebProgressRequest, which is not the real request, and this request doesn't have the proper canceledReason set.

Nika, Valentin: do you know if we could get the canceledReason for the real request and set it correctly on the RemoteWebProgressRequest?

[On the other hand, I tried to check if we could reuse our NavigationManager to detect the new navigation, but sadly in the parent process we seem to get the notification about the new navigation after we get NS_ABORTED.]

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(nika)

(In reply to Julian Descottes [:jdescottes] from comment #4)

Nika, Valentin: do you know if we could get the canceledReason for the real request and set it correctly on the RemoteWebProgressRequest?

RemoteWebProgressRequest is a tiny type used for web progress listeners in the parent process, so doesn't have any extra details on it beyond the basic mURI, mOriginalURI, and mMatchedList (https://searchfox.org/mozilla-central/rev/90dc3743ea284c258148124dc54dfd123e82586f/dom/ipc/RemoteWebProgressRequest.h#29-32). While the mCanceledReason exists because of being on nsIRequest, it is not included in the RequestData struct sent over IPC (https://searchfox.org/mozilla-central/rev/90dc3743ea284c258148124dc54dfd123e82586f/dom/ipc/PBrowser.ipdl#125-130), and therefore isn't transferred across processes.

If this is something which you think would be valuable to have available on remote web progress notifications, we could add it as an extra piece of information being sent down, but we unfortunately can't easily have a "real" channel available on these progress notifications.

Flags: needinfo?(nika) → needinfo?(jdescottes)

Thanks for the feedback, let's discuss this during triage today. I think that would allow us to at least craft a better error message, and potentially wait for the next navigation (although our spec currently doesn't support this).

Whiteboard: [webdriver:m12] → [webdriver:triage][webdriver:m12]

After discussing we agree it would be great to have the canceled reason on this request, and based on this we will emit a better error message. We should also add a wdspec test for this.

Also we should probably write a script to check how BiDi behaves when navigating to eg the top 1000 sites.

Flags: needinfo?(jdescottes)
Flags: needinfo?(valentin.gosu)
Whiteboard: [webdriver:triage][webdriver:m12] → [webdriver:m12]

Interestingly, when writing a wdspec test to reproduce the issue, I also get a failure with chrome:

future     = <Future finished exception=UnknownErrorException(unknown error, navigation canceled, as new navigation is requested by scriptInitiated, Error
    at new UnknownErrorException (<anonymous>:81:4591)
    at <anonymous>:597:6717
    at <anonymous>:1:1010
    at Array.map (<anonymous>)
    at Object.emit (<anonymous>:1:993)
    at MapperCdpClient.emit (<anonymous>:17:387)
    at #onMessage (<anonymous>:823:2977)
    at WindowCdpTransport.window.cdp.onmessage (<anonymous>:888:4763)
    at <anonymous>:1:18)>

Maybe this doesn't occur in puppeteer because of specific feature which gets enabled in Chrome by puppeteer?

And I also get the same when connecting the client from https://juliandescottes.github.io/bidi-web-client/web/ to a regular Chrome instance and doing a navigation to mozilla.org.

So at least the behavior seems consistent in both browsers, although arguably the error message is much more helpful in Chrome.

The canceledReason is added to the RequestData used to build the
RemoteWebProgressRequest. This canceledReason is useful for consumers such
as WebDriverBiDi which need to emit different error messages depending on
the reason why a specific navigation request was canceled.

Comment on attachment 9426551 [details]
Bug 1914407 - Add canceledReason to RemoteWebProgressRequest

Revision D223174 was moved to bug 1921119. Setting attachment 9426551 [details] to obsolete.

Attachment #9426551 - Attachment is obsolete: true
No longer blocks: 1921119
Depends on: 1921119
Points: 3 → 2
Whiteboard: [webdriver:m12] → [webdriver:m13]

Cool! so this is an update on firefox that will help me run puppeteer with it as announced here? https://hacks.mozilla.org/2024/08/puppeteer-support-for-firefox/

Summary: NS_BINDING_ABORTED error when navigating to https://x.com → Improve error message when navigation fails due to a window.location update

(In reply to amunizp from comment #13)

Cool! so this is an update on firefox that will help me run puppeteer with it as announced here? https://hacks.mozilla.org/2024/08/puppeteer-support-for-firefox/

You are already able to use Puppeteer with Firefox via the Webdriver BiDi protocol. It's just that in some cases websites do quirky stuff like setting window.location to a different URL itself to trigger kinda redirect. That currently fails and bug 1921973 got filed for that. Here we are only updating the error message to make it more clear to the user why the call to the browsingContext.navigate command failed.

As mentioned at https://github.com/puppeteer/puppeteer/issues/12929#issuecomment-2426520361, the issue actually also affects Puppeteer with Chrome + WebDriver BiDi. Only Chrome + CDP seems to avoid this issue.

Even with the canceledReason properly added on the request seen from the parent WebProgress, we don't always get the actual canceledReason when a load is interrupted with an NS_BINDING_ABORTED.

A simple test is to use a script doing window.location = someOtherUrl. If the script is an inline script in the page, the document request will have a canceledReason. However if the script is in a different file and loaded from the page, we will still get a NS_BINDING_ABORTED, but the document request is no longer canceled with a reason, and we can't get any extra information about what triggered the failure.

While we still stop the load from nsDocShell as mentioned in comment 4, whether or not this cancelsWithReason the webprogress' request probably depends on the fact that the channel is still open or not. So we can't reliably use the request canceledReason to get the information.

It would be more reliable to add something to the WebProgress here. If we could have a flag/state which is only set when the load is interrupted by a script.

Olli, our goal is to be able to provide more helpful error messages when a navigation is interrupted because of a JS redirect. Do you think we can add a flag on the parent process ProgressListener to do that?

Flags: needinfo?(smaug)

Yesterday tried to check the result of navigating with wait=complete to a list of top1000 websites. Out of this test, only 12 websites failed (2 being twitter/x).

Failed navigation to twitter.com Error: NS_BINDING_ABORTED 
Failed navigation to x.com Error: NS_BINDING_ABORTED 
Failed navigation to dailymotion.com Error: NS_ERROR_ABORT 
Failed navigation to bizjournals.com Error: NS_BINDING_ABORTED 
Failed navigation to www.gov.br Error: A new navigation interrupted an unfinished navigation 
Failed navigation to expedia.co.uk Error: NS_ERROR_ABORT 
Failed navigation to soccerway.com Error: NS_BINDING_ABORTED 
Failed navigation to reclameaqui.com.br Error: Address rejected 
Failed navigation to news.com.au Error: NS_BINDING_ABORTED 
Failed navigation to gouv.qc.ca Error: NS_BINDING_ABORTED 

Interestingly we see that for www.gov.br we have a better error message.
This one actually comes from the NavigationManager, which is monitoring navigations based on the content process webprogress updates.

The NavigationManager will emit navigation-failed events both when receiving error message for failed navigations due to eg content blocking (NS_ERROR_CONTENT_BLOCKED), but also when it detects a new navigation while an ongoing navigation is still in progress, which can be useful here.

The issue is that our parent-process Navigate.sys.mjs will fail the navigation slightly earlier. One option would be to check in the NavigationManager if we have a currently monitored navigation which matches the one we want to fail. In that case we could skip the NS_BINDING_ABORTED and leverage the upcoming failure from the NavigationManager.

Attachment #9426552 - Attachment is obsolete: true
See Also: → 1913327

The script I used for testing navigation to 1000 top sites can be found at https://gist.github.com/juliandescottes/5c6ab10b1d6b574e4a6d80d0ef65c4d3

For comment 17, a few successful navigations were actually incorrect due to Bug 1926894. Doing another run with a 1000ms timeout between each navigation to workaround the issue gave the following results:

Error URL
Navigation was aborted by another navigation https://cedars-sinai.org
Navigation was aborted by another navigation https://eonline.com
Navigation was aborted by another navigation https://espn.com
Navigation was aborted by another navigation https://europa.eu
Navigation was aborted by another navigation https://fc2.com
Navigation was aborted by another navigation https://go.com
Navigation was aborted by another navigation https://gouv.qc.ca
Navigation was aborted by another navigation https://moneycontrol.com
Navigation was aborted by another navigation https://mozilla.org
Navigation was aborted by another navigation https://offerup.com
Navigation was aborted by another navigation https://planetofhotels.com
Navigation was aborted by another navigation https://rakuten.co.jp
Navigation was aborted by another navigation https://soccerway.com
Navigation was aborted by another navigation https://thetimes.com
Navigation was aborted by another navigation https://tutorialspoint.com
Navigation was aborted by another navigation https://twitter.com
Navigation was aborted by another navigation https://udn.com
Navigation was aborted by another navigation https://uptodown.com
Navigation was aborted by another navigation https://vaia.com
Navigation was aborted by another navigation https://vice.com
Navigation was aborted by another navigation https://wattpad.com
Address rejected https://bab.la
Address rejected https://newegg.com
NS_BINDING_ABORTED https://deepl.com
NS_BINDING_ABORTED https://discogs.com
NS_BINDING_ABORTED https://findlaw.com
NS_BINDING_ABORTED https://techtarget.com
NS_BINDING_ABORTED https://vrbo.com
NS_ERROR_ABORT https://123rf.com
NS_ERROR_ABORT https://bbc.co.uk
NS_ERROR_ABORT https://byjus.com
NS_ERROR_ABORT https://com.be
NS_ERROR_ABORT https://linguee.com
NS_ERROR_ABORT https://medlineplus.gov
NS_ERROR_ABORT https://urbandictionary.com
NS_ERROR_ABORT https://zendesk.com
NS_ERROR_CONNECTION_REFUSED https://ouest-france.fr
NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_SECURITY, https://company-information.service.gov.uk SSL_ERROR_BAD_CERT_DOMAIN)
NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_SECURITY, SEC_ERROR_UNKNOWN_ISSUER) https://barnesandnoble.com
NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_SECURITY, SEC_ERROR_UNKNOWN_ISSUER) https://canada.ca
NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_SECURITY, SEC_ERROR_UNKNOWN_ISSUER) https://kompass.com
NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_SECURITY, SEC_ERROR_UNKNOWN_ISSUER) https://liputan6.com
NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_SECURITY, SEC_ERROR_UNKNOWN_ISSUER) https://state.mn.us
NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_SECURITY, SEC_ERROR_UNKNOWN_ISSUER) https://virginia.gov
NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_SECURITY, SSL_ERROR_BAD_CERT_DOMAIN) https://streetdirectory.com
NS_ERROR_NET_INTERRUPT https://tokopedia.com
NS_ERROR_NET_RESET https://nih.gov
NS_ERROR_NET_TIMEOUT https://jalan.net
NS_ERROR_UNKNOWN_HOST https://army.mil
NS_ERROR_UNKNOWN_HOST https://simplyhired.com
NS_ERROR_UNKNOWN_HOST https://toppr.com
NS_ERROR_WONT_HANDLE_CONTENT https://github.com

The results are also slightly inconsistent but overall we have ~20 navigations classified with the new error message as aborted navigations, 5 other NS_BINDING_ABORTED, ~10 NS_ERROR_ABORT, ~10 NS_ERROR_GENERATE_FAILURE, and a few other less frequent errors (Address rejected, NS_ERROR_CONNECTION_REFUSED, NS_ERROR_NET_INTERRUPT, NS_ERROR_NET_RESET, NS_ERROR_NET_TIMEOUT, NS_ERROR_UNKNOWN_HOST, NS_ERROR_WONT_HANDLE_CONTENT).

We should investigate those in follow up bugs.

Pushed by jdescottes@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/59ae3c80d4f7 [wdspec] Fix blank js template for inline helper r=webdriver-reviewers,whimboo https://hg.mozilla.org/integration/autoland/rev/098952454738 [bidi] Send custom error when navigation is aborted because of another navigation r=webdriver-reviewers,whimboo
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/48843 for changes under testing/web-platform/tests
Whiteboard: [webdriver:m13] → [webdriver:m13], [wptsync upstream]

Thanks Julian for the update. Maybe we should have a meta bug with all the details from comment 20?

Backed out for causing (part of commit message)

Flags: needinfo?(jdescottes)
Upstream PR was closed without merging

We have another issue, with 204 No content responses. Those are not properly detected by the navigation manager at the moment, and we don't get a navigation failed event for them.

The reason behind this is that we currently ignore STOP notifications in the NavigationListenerChild if coming from an NS_BINDING_ABORTED, because we assume this is due to a process change and we will get other updates soon. However this is not the case for 204 No Content.

Additionally there is absolutely no difference in the flags or status for the update received for a 204 compared to a NS_BINDING_ABORTED for a process change. Maybe I can get some more information on the request...

Flags: needinfo?(jdescottes)

Nika, Olli,

Using a WebProgressListener in content processes I'm having trouble to tell apart some StateChange updates with STATE_STOP and status Cr.NS_BINDING_ABORTED.

For the most recent example, I will observe this kind of state update when the HTTP response is a 204 No Content, and in this case it will be the last update for this navigation. But I also get it when there is a process change during a navigation (eg navigating from x.com to youtube.com). But in that case we will later get another state change with STATE_STOP but no NS_BINDING_ABORTED. Because of this, I have been ignoring updates with STATE_STOP and NS_BINDING_ABORTED, thinking I should always get "another" update. But it doesn't seem to always be the case, eg with 204 responses. And the request object is not helpful either in case of 204s because it doesn't implement nsIHttpChannel, and we don't have access to the statusCode.

Is there a way we could know whether the STATE_STOP is really the final event for a given navigation. I know we don't really have a comprehensive view of a "navigation" on the platform side, but maybe you have some suggestions which would help here.

Flags: needinfo?(nika)

(In reply to Julian Descottes [:jdescottes] from comment #27)

Using a WebProgressListener in content processes I'm having trouble to tell apart some StateChange updates with STATE_STOP and status Cr.NS_BINDING_ABORTED.

Where are you attaching the web progress listener? Is this being attached in the parent process through the BrowsingContext's WebProgress, or is it being listened to within a content process through the docshell? Theoretically if you're listening in the parent process you shouldn't be seeing any NS_BINDING_ABORTED statuses during process switches, as they should be suppressed internally. Events like that will appear in content processes as the navigation is being aborted there.

Do you have a link to the code which is seeing these mid-navigation NS_BINDING_ABORTED notifications?

Flags: needinfo?(nika) → needinfo?(jdescottes)

(In reply to Nika Layzell [:nika] (ni? for response) from comment #28)

(In reply to Julian Descottes [:jdescottes] from comment #27)

Using a WebProgressListener in content processes I'm having trouble to tell apart some StateChange updates with STATE_STOP and status Cr.NS_BINDING_ABORTED.

Where are you attaching the web progress listener? Is this being attached in the parent process through the BrowsingContext's WebProgress, or is it being listened to within a content process through the docshell? Theoretically if you're listening in the parent process you shouldn't be seeing any NS_BINDING_ABORTED statuses during process switches, as they should be suppressed internally. Events like that will appear in content processes as the navigation is being aborted there.

Do you have a link to the code which is seeing these mid-navigation NS_BINDING_ABORTED notifications?

I am attaching the web progress listener in content processes, mainly because the parent process listener misses to send notifications in some edge cases (data uri iframes for instance, at least that was the case when I tested both approaches 2 years ago at https://bugzilla.mozilla.org/show_bug.cgi?id=1756595#c4).

The code catching mid navigation NS_BINDING_ABORTED is at https://searchfox.org/mozilla-central/rev/f732a1651018b7c32002981a9c8b8613975ffbf9/remote/shared/js-window-actors/NavigationListenerChild.sys.mjs#168-177 (but that's content process so not surprising).

Overall maybe we should just stop using content process listeners here and just try to address the issues we have for edge cases with the parent process approach.

Flags: needinfo?(jdescottes)
See Also: → 1930616
Whiteboard: [webdriver:m13], [wptsync upstream] → [webdriver:m14], [wptsync upstream]
Whiteboard: [webdriver:m14], [wptsync upstream] → [webdriver:m15], [wptsync upstream]
Depends on: 1930616

Unassigning and moving to m16.

Note that based on the conversation on https://github.com/w3c/webdriver-bidi/issues/763#issuecomment-2718673242, the expected behavior is now to not throw when a navigation is aborted by a new navigation.

Maybe with this new behavior it will be ok to simply resolve in case of NS_BINDING_ABORTED, but this requires more investigation.

Assignee: jdescottes → nobody
Status: ASSIGNED → NEW
Whiteboard: [webdriver:m15], [wptsync upstream] → [webdriver:m16], [wptsync upstream]

Clearing the ni for now.

Flags: needinfo?(smaug)
Points: 2 → ---
Whiteboard: [webdriver:m16], [wptsync upstream] → [webdriver:triage][webdriver:backlog], [wptsync upstream]

Per comment 30, we should rediscuss this bug

Depends on: 1837949
Priority: P2 → P3
Whiteboard: [webdriver:triage][webdriver:backlog], [wptsync upstream] → [webdriver:m16:blocked], [wptsync upstream]
Type: defect → task
Points: --- → 2
Priority: P3 → P2
Summary: Improve error message when navigation fails due to a window.location update → Investigate navigation errors when navigation gets redirected or interrupted
Whiteboard: [webdriver:m16:blocked], [wptsync upstream] → [webdriver:m16], [wptsync upstream]
Depends on: 1964369
Whiteboard: [webdriver:m16], [wptsync upstream] → [webdriver:m17], [wptsync upstream]
Points: 2 → 3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: