GETURL is a small but quite usefull internet utility. This is a command line that allow to retrieve a web page (or any file that's accessible under the HTTP protocol). You can see it as the "poor man's web browser" (and therefore wonder why you should use geturl where Netscape does a far better job) but it is a bit more usefull than that: this is the door open to all sort of automated tasks. It is also a convenient web developper's tool (geturl can be used to view the HTTP header sent with a page , allowing to know what server is at the other end, for instance). Last but not least, get url is a convenient way to send a file to a remote .cgi using the HTTP post protocol. HTTP features explicitly supported are GET and POST, with or without the use of a firewall/proxy and ability to use the basic authentication scheme (that is:retrieve pages locked by a password). Near ANSI C Sources (network functions aren't part of the ANSI standard) are provided that can both be compiled on an UNIX platform and/or a Windows 95/NT box. To give you some idea of the possible applications of geturl, here are a few things I did where it played some usefull role. [] In an intranet where peoples had not access to the internet, I used geturl to provide access to some invaluable internet services such as the phone directory. The page was fetched from the web, adverts where removed, links re- targetted or added to intranet features and the page layout redone to look similar as internal pages. [] while programming ReyServe ( see it in the kitchen tools page ) geturl proved invaluable both to debug ReyServe and analyse how other web servers reacted. [] I also used geturl to send files through a firewall that wouldn't accept ftp sessions but only intranet http. [download geturl] |
Introduction (this is the text above in a more legible format) |
GETURL is a small but quite usefull internet utility. This is a command line that allow to retrieve a web page (or any file that's accessible under the HTTP protocol).
You can see it as the "poor man's web browser" (and therefore wonder why you should use geturl where Netscape does a far better job) but it is a bit more usefull than that: this is the door open to all sort of automated tasks. It is also a convenient web developper's tool (geturl can be used to view the HTTP header sent with a page , allowing to know what kind of server is at the other end, for instance). Last but not least, geturl is a convenient way to send a file to a remote .cgi using the HTTP post protocol.
HTTP features explicitly supported are GET and POST, with or without the use of a firewall/proxy and ability to use the basic authentication scheme (that is:retrieve pages locked by a password).
Near ANSI C Sources (network functions aren't part of the ANSI standard) are provided that can both be compiled on an UNIX platform and/or a Windows 95/NT box.
To give you some idea of the possible applications of geturl, here are a few things I did where it played some usefull role:
Recent versions history |
Installation of geturl |
You have no need to install geturl. This is just a plain standalone command line utility. The only thing you might be interested in is to copy geturl on some location within your path (that is: C:\WINDOWS\COMMAND or /usr/bin).
Using geturl |
Simple fetch |
Provided you have no firewall and the only thing you want to do is retrieve a web page (the Netscape home page, for instance), all you have to do is:
Fetch through a firewall/proxy |
If you want to retrieve the same Netscape homepage through a proxy/firewall machine (let suppose that machine is called dredd):
Fetching password protected pages |
Suppose your login is bond and your password is james007 and you want to load the MI6 home page :-)
Debugging... |
geturl never stops and you wonder what's happening ? You're using geturl in a shell script and it doesn't work... that's time for debugging. Using the -v flag, you put geturl in verbose mode. This will ensure that most operations done by geturl appears in real time on the screen (on stderr, to be precise).
What do you want... page or header ? |
If you want the HTTP header to be added to geturl output, just use -h. If you want the page content to be removed from the geturl output, use -d. Here is the meaning of the various -h -d parameters mixes:
-h | Get the page with its HTTP header |
-h -d | Only get the HTTP header |
nothing | Only get the page |
-d | Gets nothing |
For instance, here's how you can know what is the header Netscape server sends back:
Geturl and .cgi scripts |
It is clear you can use geturl to retrieve any kind of file, including .cgi results:
Using POST instead of get |
Some .cgi scripts must be called using another protocol (POST) where the parameters are not appended to the .cgi URL but sent separately (this kind of call is typically found in HTML forms with large or numerous text fields where it would be impractical to append those field contents to the URL).
A typical application of the post protocol is HTML chat rooms. Suppose you want to talk to a chat room .cgi that require two parameters: a nickname and a message. Then, put your message in a file (my_msg.txt), your nickname in another file (my_nick.txt) and type this command:
You might think that putting the message in a text file is okay but putting the nickname in an other file is tiresome. No problem: you can mix the get and post syntaxes and let geturl convert it all to post:
Hard to solve post problems |
If you don't want to use files to store (some of) your parameters but still want to use the post protocal, just end the command line with -post.
If the .cgi you want to talk with is so dumb that the order of parameters count and you still insist on using non-file parameters, here's what you can do...
Let say that parm1 is non-file, parm2 is file and parm3 is non-file (this is the worst possible case). then type:
That is: only the last URL count, but parameter(s) of the first dummy URL is nevertheless added to the list of parameters.
Gotchas |
Implicit URL ending |
Sooner or later, you'll try this (an URL with implicit file separator and index.html):
A browser will trap the error automaticaly and retry with the alternative URL the server suggest (usually the same with '/index.html' appended). GetURL doesn't do this (in version 1.3.3 at least) and therefore you get the error page that the server returns.
Legal status |
GetURL is freeware. You can use it as much as you want, re-distribute it, use it in comercial apps or modify it to serve your own purposes.
There are only three requirements.
Download geturl |