download.url
url
getURL
getForm
postForm
getURL
download.url
getURL("http://www.omegahat.org/RCurl/index.html")
getURL("https://sourceforge.net")
download.url
getURL
getURL
getCurlOptionConstants
names(getCurlOptionsConstants())
sort(names(getCurlOptionsConstants())) [1] "autoreferer" "buffersize" [3] "cainfo" "capath" [5] "closepolicy" "connecttimeout" [7] "cookie" "cookiefile" [9] "cookiejar" "cookiesession" [11] "crlf" "customrequest" [13] "debugdata" "debugfunction" [15] "dns.cache.timeout" "dns.use.global.cache" [17] "egdsocket" "encoding" [19] "errorbuffer" "failonerror" [21] "file" "filetime" [23] "followlocation" "forbid.reuse" [25] "fresh.connect" "ftp.create.missing.dirs" [27] "ftp.response.timeout" "ftp.ssl" [29] "ftp.use.eprt" "ftp.use.epsv" [31] "ftpappend" "ftplistonly" [33] "ftpport" "header" [35] "headerfunction" "http.version" [37] "http200aliases" "httpauth" [39] "httpget" "httpheader" [41] "httppost" "httpproxytunnel" [43] "infile" "infilesize" [45] "infilesize.large" "interface" [47] "ipresolve" "krb4level" [49] "low.speed.limit" "low.speed.time" [51] "maxconnects" "maxfilesize" [53] "maxfilesize.large" "maxredirs" [55] "netrc" "netrc.file" [57] "nobody" "noprogress" [59] "nosignal" "port" [61] "post" "postfields" [63] "postfieldsize" "postfieldsize.large" [65] "postquote" "prequote" [67] "private" "progressdata" [69] "progressfunction" "proxy" [71] "proxyauth" "proxyport" [73] "proxytype" "proxyuserpwd" [75] "put" "quote" [77] "random.file" "range" [79] "readfunction" "referer" [81] "resume.from" "resume.from.large" [83] "share" "ssl.cipher.list" [85] "ssl.ctx.data" "ssl.ctx.function" [87] "ssl.verifyhost" "ssl.verifypeer" [89] "sslcert" "sslcertpasswd" [91] "sslcerttype" "sslengine" [93] "sslengine.default" "sslkey" [95] "sslkeypasswd" "sslkeytype" [97] "sslversion" "stderr" [99] "tcp.nodelay" "telnetoptions" [101] "timecondition" "timeout" [103] "timevalue" "transfertext" [105] "unrestricted.auth" "upload" [107] "url" "useragent" [109] "userpwd" "verbose" [111] "writefunction" "writeheader" [113] "writeinfo"Each of these and what it controls is described in the libcurl man(ual) page for curl_easy_setopt and that is the authoritative documentation. Anything we provide here is merely repetition or additional explanation. The names of the options require a slight explanation. These correspond to symbolic names in the C code of libcurl. For example, the option <curl:opt>url</curl:opt> in R corresponds to CURLOPT_URL in C. Firstly, uppercase letters are annoying to type and read, so we have mapped them to lower case letters in R. We have also removed the prefix "CURLOPT_" since we know the context in which they option names are being used. And lastly, any option names that have a _ (after we have removed the CURLOPT_ prefix) are changed to replace the '_' with a '.' so we can type them in R without having to quote them. For example, combining these three rules, "CURLOPT_URL" becomes <curl:opt>url</curl:opt> and CURLOPT_NETRC_FILE becomes <curl:opt>netrc.file</curl:opt>. That is the mapping scheme. The code that handles options in RCurl automatically maps the user's inputs to lower case. This means that you can use any mixture of upper-case that makes your code more readable to you and others. For example, we might write
writeFunction = basicTextGatherer()
or
HTTPHeader = c(Accept="text/html")
We specify one or more options by using the names. To make
interactive use easier, we perform partial matching on the names
relative to the set of know names. So, for example, we could specify
getURL("http://www.omegahat.org/RCurl/testPassword",
verbose = TRUE)
getURL("http://www.omegahat.org/RCurl/testPassword",
v = TRUE)
[1] "autoreferer" "buffersize" [3] "closepolicy" "connecttimeout" [5] "cookiesession" "crlf" [7] "dns.cache.timeout" "dns.use.global.cache" [9] "failonerror" "followlocation" [11] "forbid.reuse" "fresh.connect" [13] "ftp.create.missing.dirs" "ftp.response.timeout" [15] "ftp.ssl" "ftp.use.eprt" [17] "ftp.use.epsv" "ftpappend" [19] "ftplistonly" "header" [21] "http.version" "httpauth" [23] "httpget" "httpproxytunnel" [25] "infilesize" "ipresolve" [27] "low.speed.limit" "low.speed.time" [29] "maxconnects" "maxfilesize" [31] "maxredirs" "netrc" [33] "nobody" "noprogress" [35] "nosignal" "port" [37] "post" "postfieldsize" [39] "proxyauth" "proxyport" [41] "proxytype" "put" [43] "resume.from" "ssl.verifyhost" [45] "ssl.verifypeer" "sslengine.default" [47] "sslversion" "tcp.nodelay" [49] "timecondition" "timeout" [51] "timevalue" "transfertext" [53] "unrestricted.auth" "upload" [55] "verbose"The <curl:opt>connecttimeout</curl:opt> gives the maximum number of seconds the connection should take before raising an error, so this is a number. The <curl:opt>header</curl:opt> option, on the other hand, is merely a flag to indicate whether header information from the response should be included. So this can be a logical value (or a number that is 0 to say FALSE or non-zero for TRUE.) At present, all numbers passed from R are converted to long when used in libcurl. Many options are specified as strings. For example, we can specify the user password for a URI as
getURL("http://www.omegahat.org/RCurl/testPassword/index.html", userpwd = "bob:duncantl", verbose = TRUE)
getURL("http://www.omegahat.org/RCurl/index.html", useragent="RCurl", referer="http://www.omegahat.org")
getURL("http://www.omegahat.org/RCurl", httpheader = c(Accept="text/html", 'Made-up-field' = "bob"))
> getURL("http://www.omegahat.org", httpheader = c(Accept="text/html", 'Made-up-field' = "bob"), verbose = TRUE)
* About to connect() to www.omegahat.org port 80
* Connected to www.omegahat.org (169.237.46.32) port 80
> GET / HTTP/1.1
Host: www.omegahat.org
Pragma: no-cache
Accept: text/html
Made-up-field: bob
(Note that not all servers will tolerate setting header fields arbitrarily
and may return an error.)
The key thing to note is that headers are specified as name-value
pairs in a character vector. R takes these and pastes the name and
value together and passes the resulting character vector to libcurl.
So while it is convenient to express the headers as
c(name = "value", name = "value")
c("name: value", "name: value")
getNativeSymbolInfo
basicTextGatherer
getURL
getURL
basicTextGatherer
h = basicTextGatherer()
txt = getURL("http://www.omegahat.org/RCurl", header = TRUE, headerfunction = h$update)
getURL
h$value()
debugGatherer
TRUE, libcurl will provide a lot of information about
its actions. By default, these will be written on the console
(e.g. stderr). In some cases, we would not want these to be on the
screen but instead, for example, displayed in a GUI or stored in a
variable for closer examination. We can do this by providing a
callback function for the debugging output via the <curl:opt>debugfunction</curl:opt>
option for libcurl.
The debugGatherer
d = debugGatherer()
x = getURL("http://www.omegahat.org/RCurl", debugfunction=d$update, verbose = TRUE)
(R) names(d$value()) [1] "text" "headerIn" "headerOut" "dataIn" "dataOut"The headerIn and headerOut fields report the text of the header for the response from the Web server and for our request respectively. Similarly, the dataIn and dataOut fields give the body of the response and request. And the text is just messages from libcurl. We should note that not all options are (currently)) meaningful in R. For example, it is not currently possible to redirect standard error for libcurl to a different FILE* via the "stderr" option. (In the future, we may be able to specify an R function for writing errors from libcurl, but we have not put that in yet.)
http://www.omegahat.org/cgi-bin/form.pl?a=1&b=2
getForm
postForm
getForm("http://www.google.com/search", hl="en", lr="", ie="ISO-8859-1", q="RCurl", btnG="Search")
htmlTreeParse
getForm
curlEscape
curlUnescape
postForm
postForm("http://www.speakeasy.org/~cgires/perl_form.cgi",
"some_text" = "Duncan",
"choice" = "Ho",
"radbut" = "eep",
"box" = "box1, box2"
)
getForm
postForm
getURL
getCurlHandle
handle = getCurlHandle()
a = getURL("http://www.omegahat.org/RCurl", curl = handle)
b = getURL("http://www.omegahat.org/", curl = handle)
header=TRUE option in the first call
above, it would remain set for the second call. This can be sometimes
inconvenient. In such cases, either use separate libcurl handles, or
reset the options.
The function
dupCurlHandle
curlPerform
getURL
getCurlInfo
h = getCurlHandle()
getURL("http://www.omegahat.org", curl = h)
names(getCurlInfo(h))
[1] "effective.url" "response.code" [3] "total.time" "namelookup.time" [5] "connect.time" "pretransfer.time" [7] "size.upload" "size.download" [9] "speed.download" "speed.upload" [11] "header.size" "request.size" [13] "ssl.verifyresult" "filetime" [15] "content.length.download" "content.length.upload" [17] "starttransfer.time" "content.type" [19] "redirect.time" "redirect.count" [21] "private" "http.connectcode" [23] "httpauth.avail" "proxyauth.avail"These provide us the actual name of the URI downloaded after redirections, etc.; information about the transfer speed, etc.; etc. See the man page for curl_easy_getinfo.
curlVersion
curlVersion
$age
[1] 2
$version
[1] "7.12.0"
$vesion_num
[1] 461824
$host
[1] "powerpc-apple-darwin7.4.0"
$features
ipv6 ssl libz ntlm largefile
1 4 8 16 512
$ssl_version
[1] " OpenSSL/0.9.7b"
$ssl_version_num
[1] 9465903
$libz_version
[1] "1.2.1"
$protocols
[1] "ftp" "gopher" "telnet" "dict" "ldap" "http" "file" "https"
[9] "ftps"
$ares
[1] ""
$ares_num
[1] 0
$libidn
[1] ""
The help page for the R function
explains the fields which are hopefully
clear from the names.
The only ones that might be obscure are
ares and libidn.
ares refers to asynchronous domain name server (DNS) lookup
for resolving the IP address (e.g. 128.41.12.2) corresponding to a machine name (e.g. www.omegahat.org).
"GNU Libidn is an implementation of the Stringprep, Punycode and IDNA specifications defined by the IETF Internationalized Domain Names (IDN)"
(taken from http://www.gnu.org/software/libidn/).
curlGlobalInit
curlGlobalInit
none ssl win32 all
0 1 2 3
attr(,"class")
[1] "CurlGlobalBits" "BitIndicator"
We would call curlGlobalInit
curlGlobalInit(c("ssl", "win32"))
curlGlobalInit(c("ssl"))
setBitIndicators
curlOpts
mapCurlOptNames
opts = curlOptions(header = TRUE, userpwd = "bob:duncantl", netrc = TRUE)
getURL("http://www.omegahat.org/RCurl/testPassword/index.html", verbose = TRUE, .opts = opts)
h = getCurlHandle(header = TRUE, userpwd = "bob:duncantl", netrc = TRUE)
getURL("http://www.omegahat.org/RCurl/testPassword/index.html", verbose = TRUE, curl = h)
getURL
curlSetOpt
POST /hibye.cgi HTTP/1.1
Connection: close
Accept: text/xml
Accept: multipart/*
Host: services.soaplite.com
User-Agent: SOAP::Lite/Perl/0.55
Content-Length: 450
Content-Type: text/xml; charset=utf-8
SOAPAction: "http://www.soaplite.com/Demo#hi"
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/1999/XMLSchema"
xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance">
<SOAP-ENV:Body>
<namesp1:hi xmlns:namesp1="http://www.soaplite.com/Demo"/>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
Accept: text/xml
Accept: multipart/*
SOAPAction: "http://www.soaplite.com/Demo#hi"
Content-Type: text/xml; charset=utf-8
body = '<?xml version="1.0" encoding="UTF-8"?>\
<SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" \
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" \
xmlns:xsd="http://www.w3.org/1999/XMLSchema" \
xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" \
xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance">\
<SOAP-ENV:Body>\
<namesp1:hi xmlns:namesp1="http://www.soaplite.com/Demo"/>\
</SOAP-ENV:Body>\
</SOAP-ENV:Envelope>\n'
curlPerform(url="http://services.soaplite.com/hibye.cgi",
httpheader=c(Accept="text/xml", Accept="multipart/*", SOAPAction='"http://www.soaplite.com/Demo#hi"',
'Content-Type' = "text/xml; charset=utf-8"),
postfields=body,
verbose = TRUE
)
getURL
curlPerform
curlPerform
getURL
curlPerform(url="http://services.soaplite.com/hibye.cgi",
httpheader=c(Accept="text/xml", Accept="multipart/*", SOAPAction='"http://www.soaplite.com/Demo#hi"',
'Content-Type' = "text/xml; charset=utf-8"),
postfields=body,
verbose = TRUE
)
getURL
getURL
htmlTreeParse
htmlTreeParse
getURL
[odbAccess] The odbAccess package: creating S functions from HTML forms.. odbAccess (coming soon)
[SOAP] Programming Web Services with SOAP. O'Reilly
[1] I have a local version (not with SSL) but they are not connections since the connection data structure is not exposed in the R API, yet!