How to download a file using Selenium's WebDriver?

  • Basically I want to at least check that a download-able file exists / download link works and preferably get stuff like the file size too.

    Here's an example:

    link = self.browser.find_element_by_link_text('link text')
    href = link.get_attribute('href')
    download = self.browser.get(href)
    print download
    

    That fourth line prints "None", presumably because I haven't manually clicked the Save button, and even if I had, I doubt WebDriver would be able to "see" the file.

    Any ideas? I'm using Firefox as my browser-under-test, and I understand that the file handling for downloads is somewhat browser and/or OS-specific.

    this works: downloadBytes($(selector)[0][email protected]) using geb.Page

    One potential solution is to obtain the URL for the file via Selenium, create a (non-Selenium) connection, copy Selenium's cookies to the connection (if necessary), and download the file. Since this method utilizes non-Selenium APIs to download the file, it will work with (or without) any browser. For more info, see my answer here: https://stackoverflow.com/questions/16746707/how-to-download-any-file-and-save-it-to-the-desired-location-using-selenium-webd/46774849#46774849

  • Melena

    Melena Correct answer

    9 years ago

    As far as I know there is no easy way to make Selenium download files because browsers use native dialogs for it which cannot be controlled by JavaScript, so you need some "hack". Check this, hope it helps.

    Thanks. I might try to do it with the Python requests module.

    +1 but "Provide context for links Links to external resources are encouraged, but please add context around the link so your fellow users will have some idea what it is and why it’s there. Always quote the most relevant part of an important link, in case the target site is unreachable or goes permanently offline." as stated here: http://sqa.stackexchange.com/help/how-to-answer

  • Here's a solution. Set Firefox's preferences to save automatically, and not have the downloads window popup. Then you just grab the file, and it'll download.

    So, something like this:

    FirefoxProfile fxProfile = new FirefoxProfile();
    
    fxProfile.setPreference("browser.download.folderList",2);
    fxProfile.setPreference("browser.download.manager.showWhenStarting",false);
    fxProfile.setPreference("browser.download.dir","c:\\mydownloads");
    fxProfile.setPreference("browser.helperApps.neverAsk.saveToDisk","text/csv");
    
    WebDriver driver = new FirefoxDriver(fxProfile);
    driver.navigate().to("http://www.foo.com/bah.csv");
    

    and given you now have the download directory, never ask to save, and no download manager appearing, automation from this point should be straightforward.

    Should this answer still be working with Firefox 58? I can't manage to make it work.

    We have been using this for a long time with Firefox pre-version 47 and it worked great. Now we've upgraded to 58 and it no longer works.

    Here it still / again works with the following setting: profile.setPreference("browser.download.dir", "path\\..."); profile.setPreference("browser.download.folderList", 2); profile.setPreference("browser.helperApps.neverAsk.saveToDisk", "text/plain");

    @k-den just in case you missed the latest comment

    @meles too (see above)

  • First of all, why do you want to download the file? Are you going to do anything with it?

    The majority of people who want to download files just do it so that they can show an automation framework downloading files because it makes somebody non-technical ooo and ahh.

    You can check the header response to check that you get a 200 OK (or maybe a redirect, depends on your expected outcome) and it will tell you that a file exists.

    Only download files if you are actually going to do something with them, if you are downloading them for the sake of doing it you are wasting test time, network bandwidth and disk space.

    Here is my implementation.

    This finds the link on the page and extracts the URL being linked to. It then uses apache commons to replicate the browser session used by selenium and then download the file. There are some instances where it won't work (where the link found on the page does not actually link to the download file but a layer to prevent automated file download).

    Generally, it works well and is cross-platform/cross-browser compliant.

    The code is:

         /*
          * Copyright (c) 2010-2011 Ardesco Solutions - http://www.ardescosolutions.com
          *
          * Licensed under the Apache License, Version 2.0 (the "License");
          * you may not use this file except in compliance with the License.
          * You may obtain a copy of the License at
          *
          * http://www.apache.org/licenses/LICENSE-2.0
          *
          * Unless required by applicable law or agreed to in writing, software
          * distributed under the License is distributed on an "AS IS" BASIS,
          * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          * See the License for the specific language governing permissions and
          * limitations under the License.
          */
    
        package com.lazerycode.ebselen.customhandlers;
    
        import com.google.common.annotations.Beta;
        import com.lazerycode.ebselen.EbselenCore;
        import com.lazerycode.ebselen.handlers.FileHandler;
        import org.apache.commons.httpclient.*;
        import org.apache.commons.httpclient.cookie.CookiePolicy;
        import org.apache.commons.httpclient.HttpClient;
        import org.apache.commons.httpclient.methods.GetMethod;
    
        import java.io.*;
        import java.net.URL;
        import java.util.Set;
    
        import org.openqa.selenium.WebDriver;
        import org.openqa.selenium.WebElement;
        import org.slf4j.Logger;
        import org.slf4j.LoggerFactory;
    
        @Beta
        public class FileDownloader {
    
            private static final Logger LOGGER = LoggerFactory.getLogger(EbselenCore.class);
            private WebDriver driver;
            private String downloadPath = System.getProperty("java.io.tmpdir");
    
            public FileDownloader(WebDriver driverObject) {
                this.driver = driverObject;
            }
    
            /**
             * Get the current location that files will be downloaded to.
             *
             * @return The filepath that the file will be downloaded to.
             */
            public String getDownloadPath() {
                return this.downloadPath;
            }
    
            /**
             * Set the path that files will be downloaded to.
             *
             * @param filePath The filepath that the file will be downloaded to.
             */
            public void setDownloadPath(String filePath) {
                this.downloadPath = filePath;
            }
    
    
            /**
             * Load in all the cookies WebDriver currently knows about so that we can mimic the browser cookie state
             *
             * @param seleniumCookieSet
             * @return
             */
            private HttpState mimicCookieState(Set<org.openqa.selenium.Cookie> seleniumCookieSet) {
                HttpState mimicWebDriverCookieState = new HttpState();
                for (org.openqa.selenium.Cookie seleniumCookie : seleniumCookieSet) {
                    Cookie httpClientCookie = new Cookie(seleniumCookie.getDomain(), seleniumCookie.getName(), seleniumCookie.getValue(), seleniumCookie.getPath(), seleniumCookie.getExpiry(), seleniumCookie.isSecure());
                    mimicWebDriverCookieState.addCookie(httpClientCookie);
                }
                return mimicWebDriverCookieState;
            }
    
            /**
             * Mimic the WebDriver host configuration
             *
             * @param hostURL
             * @return
             */
            private HostConfiguration mimicHostConfiguration(String hostURL, int hostPort) {
                HostConfiguration hostConfig = new HostConfiguration();
                hostConfig.setHost(hostURL, hostPort);
                return hostConfig;
            }
    
            public String fileDownloader(WebElement element) throws Exception {
                return downloader(element, "href");
            }
    
            public String imageDownloader(WebElement element) throws Exception {
                return downloader(element, "src");
            }
    
            public String downloader(WebElement element, String attribute) throws Exception {
                //Assuming that getAttribute does some magic to return a fully qualified URL
                String downloadLocation = element.getAttribute(attribute);
                if (downloadLocation.trim().equals("")) {
                    throw new Exception("The element you have specified does not link to anything!");
                }
                URL downloadURL = new URL(downloadLocation);
                HttpClient client = new HttpClient();
                client.getParams().setCookiePolicy(CookiePolicy.RFC_2965);
                client.setHostConfiguration(mimicHostConfiguration(downloadURL.getHost(), downloadURL.getPort()));
                client.setState(mimicCookieState(driver.manage().getCookies()));
                HttpMethod getRequest = new GetMethod(downloadURL.getPath());
                FileHandler downloadedFile = new FileHandler(downloadPath + downloadURL.getFile().replaceFirst("/|\\\\", ""), true);
                try {
                    int status = client.executeMethod(getRequest);
                    LOGGER.info("HTTP Status {} when getting '{}'", status, downloadURL.toExternalForm());
                    BufferedInputStream in = new BufferedInputStream(getRequest.getResponseBodyAsStream());
                    int offset = 0;
                    int len = 4096;
                    int bytes = 0;
                    byte[] block = new byte[len];
                    while ((bytes = in.read(block, offset, len)) > -1) {
                        downloadedFile.getWritableFileOutputStream().write(block, 0, bytes);
                    }
                    downloadedFile.close();
                    in.close();
                    LOGGER.info("File downloaded to '{}'", downloadedFile.getAbsoluteFile());
                } catch (Exception Ex) {
                    LOGGER.error("Download failed: {}", Ex);
                    throw new Exception("Download failed!");
                } finally {
                    getRequest.releaseConnection();
                }
                return downloadedFile.getAbsoluteFile();
            }
        }
    

    Just curious, will that work for HTTPS? I guess no, as there is no support for SSL in your HttpClient configuration...

    It should work (http://hc.apache.org/httpclient-3.x/sslguide.html), but the http client library is now EOL so you are better off using the HTTP Component library http://hc.apache.org/httpcomponents-client-ga/examples.html

    I want to download a file to check if it is containing the right data.

    This won't work with HttpOnly cookies would it?

  • The best way I have found to do this is by accessing the page, getting the download link, and performing a HEAD request for the file with an HTTP library. The response will contain the length of the file and it's type.

    A HEAD request is preferable since it will only retrieve the headers instead of pulling down the entire file.

    And if the file is behind auth, you will need to pull the session cookie from Selenium's cookie store and pass it into the HTTP library when performing the request.

    That, or you can configure the browser you're using to auto-download files to a specific location and then perform checks against the file on disk.

    I've outlined each of these approaches in detail with working Ruby code here:

    Cheers,
    Dave H
    @TourDeDave

    In general, this is nice, as long the page does not require a login. As soon the user has to log in, this solution will break.

  • Cross platform and Python. This is the best approach.

    Link only answers are discouraged. Please add the solution part from the link to your answer.

  • I made my own version of the downloader, by using an ajax request and returning the bytes. Has the advantage that it uses the browser directly, so authentication and cookies do not need to be dealt with. Has the disadvantage that you're restricted by same-origin rule, it might need a lot of memory and maybe also fail in old browsers.

    Still sometimes is very useful:

    import org.openqa.selenium.JavascriptExecutor;
    import org.openqa.selenium.WebDriver;
    
    import java.io.ByteArrayInputStream;
    import java.io.IOException;
    import java.io.InputStream;
    import java.util.ArrayList;
    import java.util.concurrent.TimeUnit;
    
    public class AjaxFileDownloader {
    
        private WebDriver driver;
    
        public AjaxFileDownloader(WebDriver driverObject) {
            this.driver = driverObject;
            driver.manage().timeouts().setScriptTimeout(15, TimeUnit.SECONDS); // maybe you need a different timeout
        }
    
        public InputStream download(String url) throws IOException {
            String script = "var url = arguments[0];" +
                    "var callback = arguments[arguments.length - 1];" +
                    "var xhr = new XMLHttpRequest();" +
                    "xhr.open('GET', url, true);" +
                    "xhr.responseType = \"arraybuffer\";" + //force the HTTP response, response-type header to be array buffer
                    "xhr.onload = function() {" +
                    "  var arrayBuffer = xhr.response;" +
                    "  var byteArray = new Uint8Array(arrayBuffer);" +
                    "  callback(byteArray);" +
                    "};" +
                    "xhr.send();";
            Object response = ((JavascriptExecutor) driver).executeAsyncScript(script, url);
            // Selenium returns an Array of Long, we need byte[]
            ArrayList<Long> byteList = (ArrayList<Long>) response;
            byte[] bytes = new byte[byteList.size()];
            for(int i = 0; i < byteList.size(); i++) {
                bytes[i] = (byte)(long)byteList.get(i);
            }
            return new ByteArrayInputStream(bytes);
        }
    
    }
    

    Amazing this works great, thank you very much for this up-to-date answer!

    I modified this a little by setting the responsetype to "text" and just doing the callback without the array stuff. It works beautifully. I needed it for csv file.

  • This blog post describes a straight forward way of invoking another library to download the file (so not through the browser) whilst maintaining selenium's session with the site - so it works on password-protected files, etc.

    I have realised this is similar to Ardesco's answer. However, I think the solution in the blog post is simpler. Also, it is for .NET rather than Java, so it may be useful to people targeting that platform.

    Windows only? Blargh.

  • There are many ways to download file in Selenium, one of the easiest way in Firefox using Firefox Profile.

    First add preferences in profiles and specify the MIME type of file and then you can open firefox with above preferences.

    I found below article interesting which cover above scenario

    http://learn-automation.com/how-to-download-files-using-selenium-webdriver/

  • When you are using selenium web driver with for firefox profile , the best way to deal with the modal window is by changing the firefox profile settings to automatically downloading the file to the desired location.

    The other answers here has useful info but not a working solution based on JAVA.

    here is the code snippet that does work like a trick -

    FirefoxProfile profile = new FirefoxProfile();          
    profile.setPreference("browser.download.folderList",2);
    profile.setPreference("browser.download.manager.showWhenStarting",false);
    profile.setPreference("browser.download.dir","C\\downloads");
    profile.setPreference("browser.helperApps.neverAsk.saveToDisk", 
    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;"
    + "application/pdf;" 
    + "application/vnd.openxmlformats-officedocument.wordprocessingml.document;" 
    + "text/plain;" 
    + "text/csv");  
    Webdriver driver = new FirefoxDriver(profile);
    
  • SCALA:

    val remote = new RemoteWebDriver(new URL("http://localhost:9515/"),
    DesiredCapabilities.chrome())
    val a =  remote.findElements(By.xpath("""//td[6]/a[2]""")).iterator()
    
    while(a.hasNext){   
      val xmlclick = a.next()   
      println("a href: " + xmlclick.getAttribute("href") + "; "+ xmlclick.getText  )  
      xmlclick.click()   
      Thread.sleep(1000) 
    }
    

    for full automatic work you need to read about DesiredCapabilities as this: https://stackoverflow.com/questions/23510816/how-to-handle-downloading-a-file-in-selenium-webdriver

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM