Undocumented RCURl "progressfunction" with URL redirection

Consider this simple function RCurlto report a download progress:

library(RCurl)
curlDown=function(url, follow=TRUE){
    x=getURL(url, followlocation=follow, noprogress = FALSE,
        progressfunction=function(down,up) cat(down, '\n'))    
}

Please note that with followlocation=TRUE(by default) we take for a possible redirect that the server sends as part of the HTTP header.

We get:

curlDown("http://www.example.com")
# 0 0 
# 1270 1079 
# 1270 1127 
# 1270 1270 
# 1270 1270 
# 1270 1270 

As you can see, the variable downpassed to the callback with RCurlis a numerical vector in which the first element is the total amount of download in bytes, and the second is the current size of the downloaded file. Due to space limitations, I don’t show it here, but on a separate inspection I saw that the first corresponds to the field Content-Lengthin the response header.

Not every server gives a field Content-Lengthin the response header:

curlDown("http://www.google.it")
# 0 0  
# 0 603
# ... blah blah
# 0 44848 
# 0 44848 

RCurl ( NA ?).

Google ".com" , ".it", , (). ".it'-", :

curlDown("http://www.google.com")
# 0 0 
# 274 274 
# 274 274 
# 274 274 
# 274 0 
# 274 0 
# 274 603
# ... blah blah
# 274 44896 
# 274 44896 

. curlDown("http://www.google.it"), , , ; , !

, :

curlDown("http://www.google.com", follow=FALSE)
# 0 0 
# 274 274 
# 274 274 
# 274 274 

.com Content-Length, 274 , - (. curlDown("http://www.google.it").

, RCurl ( ), 274 .

- ?

+3
3

(.. ), ( curl).

RCurl - , .

curlDown=function(url, curl =NULL){
    if(is.null(curl)) curl = getCurlHandle()
    h= basicHeaderGatherer()
    x=getURL(url, curl = curl, noprogress = FALSE,
        headerfunction = h$update,
        progressfunction=function(down,up)   cat(down, '\n'))
    loc=h$value()["Location"]
    if(!is.na(loc)) curlDown(loc)               
}

:

# curlDown("http://www.google.com") 
# 0 0 
# 258 258 
# 258 258 
# 258 258 
# 0 0 
# 0 603 
# 0 2003 
# ... blah blah
# 0 44824 
# 0 44824 
# 0 44824 

, , ( RCurl).

0

, Rcurl curl, , curl_set_easyopt CURLOPT_PROGRESSFUNCTION. 0. , curl. (. , )

#include <stdio.h>
#include <curl/curl.h>

curl_progress_callback progress(void *clientp, double dltotal, double dlnow,
                                double ultotal, double ulnow)
{
    fprintf(stderr, "PROGRESS: %.0f %.0f %.0f %.0f\n",
            dltotal, dlnow, ultotal, ulnow);
    return 0;
}

int main(int argc, char **argv)
{
    CURL *curl;
    CURLcode res;

    curl = curl_easy_init();
    curl_easy_setopt(curl, CURLOPT_URL, argv[1]);
    curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
    curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0L);
    curl_easy_setopt(curl, CURLOPT_PROGRESSFUNCTION, progress);
    res = curl_easy_perform(curl);
    curl_easy_cleanup(curl);

    return 0;
}

$ clang curl.c -lcurl && ./a.out http://google.com > /dev/null
PROGRESS: 0 0 0 0
PROGRESS: 0 0 0 0
PROGRESS: 219 219 0 0
PROGRESS: 219 219 0 0
PROGRESS: 219 219 0 0
PROGRESS: 219 219 0 0
PROGRESS: 219 0 0 0
PROGRESS: 219 2097 0 0
PROGRESS: 219 6441 0 0
PROGRESS: 219 12233 0 0
PROGRESS: 219 20921 0 0
PROGRESS: 219 32505 0 0
PROGRESS: 219 45360 0 0
PROGRESS: 219 45360 0 0
PROGRESS: 219 45360 0 0
+3

, :

In short, it is not possible to create a progress bar for a site that uses encoded transmission coding (that is, situations in which there is no "Content-Length" header).

You need to either skip the progress bar in these cases (see, for example, my answer to your previous question ) or set a very high initial oversized file size, knowing that the bar will never reach 100%.

+1
source

All Articles