A friend needed to move her website from one hosting provider to another. The problem was that she used a website builder on the first provider so she didn’t have the content on her computer; she recoiled at the thought of re-entering it. wget to the rescue!
GNU Wget is a tool that can download a web page or an entire site using http or https as a browser would, or using ftp (the file transfer protocol) if a site allows it.
If I wanted to retrieve just a single page from a website, I could do that easily.
would retrieve the main page for the site, probably a file called index.html.
If I wanted to get a particular file or page from a site I could specify it, instead:
If I wanted all of example.com, I could use
wget –r https://www.example.com
That could represent a lot of data, though! Fortunately, wget limits me to only going down through five levels of directory (folder) unless I tell it to do more. There are also options to tell it to retrieve the whole site (-m for mirror) or convert the site for local viewing by making all the internal links refer to the copy (-k).
Some options I tend to use include:
-r wget –r –A.pdf https://www.example.com retrieves just the pdf files found
-b wget –b https://www.example.com/bigdoc.pdf resumes downloading bigdoc.pdf after a failure
-p wget –p https://www.example.com/page.html grab all the files necessary to display page.html
-P wget –P c:\tmp\page –p https://www.example.com/page.html as above but put the files in c:\tmp\page instead of the current directory/folder.
Wget is free software and covered by the GNU General Public License (GPL).
Linux, Mac, and UNIX users can install wget using their favorite package manager
Windows users will need to go to the download page for Windows and choose what they want to install. For most people that will be the “Complete package, except sources”. I clicked on the link, my browser brought up a window asking whether or not I wanted to download it, I said I did and the installer showed up in my downloads folder. I ran the installer, approved the license, and answered the questions by accepting the defaults. That’s it.
Wget itself is a command-line program, so I have to start cmd to use it.
There is a (currently) more current version for Windows at https://eternallybored.org/misc/wget/ I tried it and it seemed to work.
Wget is a powerful tool for website maintenance and file retrieval. My friend was quite grateful for my help in mirroring her site. Hopefully, this essential intro to wget will help you with your website maintenance or help you retrieve files from the web more easily.