wget: some quick tips

Date October 11, 2007

wget is one of my favorite tools in *nix land. Sometimes you want to convert a dynamic site to html. Sometimes, you want to download all the rpm, deb, iso, or tgz files in a directory. Other times, you just want to create an archive. wget does it all!

Here are some of my favorite wget command options, and what they do:


$ wget -r -np -nd http://example.com/packages/

This little gem is probably my most used variation. It will download all files in the /packages/ directory on example.com — without traversing up to parent directories (-np), and without recreating the directory structure on your machine (-nd).


$ wget -r -np -nd --accept=iso http://example.com/centos-5/i386/

Adding the –accept argument with a list of file extensions (comma separated) will grab only those files ending in the specified extension.

Another way to grab just the files you want:

$ wget -i filename.txt

Put all the desired urls in filename.txt and run wget against it to download a list of files automatically.

On a bad connection?

$ wget -c http://example.com/really-big-file.iso

The “-c” option tells wget to continue and retry until it has completed downloading.


wget -m -k (-H) http://www.example.com/

Mirror a site, converting its links to work locally, so that you can move the site to another server. Use the ‘-H’ option if images are loaded from another site.

Another useful tool for mirroring websites is httrack. I blogged about it a couple of weeks ago here.

10 Responses to “wget: some quick tips”

  1. Sudar said:

    Thanks for sharing your tip. Was really helpful.


    Sudar

  2. NxT said:

    -erobots=off makes wget to ignore robots.txt

  3. David said:

    Thanks. This is nicer than digging through the man page.

  4. links for 2007-10-12 :: jason brown said:

    [...] wget: some quick tips Sometimes you want to convert a dynamic site to html… (tags: download leet network software tips tool tutorial unix web howto) [...]

  5. Web 2.0 Feedback Loop » Blog Archive » links for 2007-10-13 said:

    [...] wget: some quick tips » Tip o’ the Day (tags: bash wget) [...]

  6. Riccardo Giuntoli said:

    It can be also very useful this option:

    –user-agent=agent-string
    Identify as agent-string to the HTTP server.

    Some sites stop the default user agent of wget because they don’t want to be mirrored . But with this option you can provide what you want and continue to mirror without trouble.

    Regards, Riccardo Giuntoli

  7. Matt said:

    Thanks for the great tips.

    I will start using wget to mirror sites!

    I wonder if there is a windows port of wget so I can use it at work.

    -Matt

  8. Vayn技术网摘 » wget 使用技巧 said:

    [...] [本文转载自LinuxToy:http://linuxtoy.org/archives/wget-tips.html][via] [...]

  9. Pádraig Brady said:

    Very similar: http://www.pixelbeat.org/cmdline.html#wget

  10. wget 使用技巧 | LinuxDig said:

    [...] [via] [...]

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>