Morning all,<\/p>\n
I am following a book called Network Security Assessment and I am stuck on a particular section.<\/p>\n
The author mentions wget for crawling and scraping a website, this sounds like it could be quite useful, however, the command provided in the book does not work as expected.<\/p>\n
The author says to use the following command to crawl and scrape the entire contents of a website.<\/p>\n
wget -r -m -nv http://www.example.org<\/a><\/p>\n Then use the tree command which should show all the pages within the website.<\/p>\n The above wget command only downloads the index.html file, it does not download all files.<\/p>\n I have tried to use the wget man pages but cannot had no luck finding a solution.<\/p>\n Has anyone seen the above before?<\/p>\n Many thanks<\/p>","upvoteCount":8,"answerCount":7,"datePublished":"2022-12-27T06:41:33.000Z","author":{"@type":"Person","name":"cyberspice82","url":"https://community.spiceworks.com/u/cyberspice82"},"suggestedAnswer":[{"@type":"Answer","text":" Morning all,<\/p>\n I am following a book called Network Security Assessment and I am stuck on a particular section.<\/p>\n The author mentions wget for crawling and scraping a website, this sounds like it could be quite useful, however, the command provided in the book does not work as expected.<\/p>\n The author says to use the following command to crawl and scrape the entire contents of a website.<\/p>\n wget -r -m -nv http://www.example.org<\/a><\/p>\n Then use the tree command which should show all the pages within the website.<\/p>\n The above wget command only downloads the index.html file, it does not download all files.<\/p>\n I have tried to use the wget man pages but cannot had no luck finding a solution.<\/p>\n Has anyone seen the above before?<\/p>\n Many thanks<\/p>","upvoteCount":8,"datePublished":"2022-12-27T06:41:33.000Z","url":"https://community.spiceworks.com/t/issue-with-wget-for-crawling-and-scraping/942993/1","author":{"@type":"Person","name":"cyberspice82","url":"https://community.spiceworks.com/u/cyberspice82"}},{"@type":"Answer","text":" It could be that the website is restricting your activities and there are examples of how to get that command working here: Download a whole website with wget (or other) including all its downloadable content - Ask Ubuntu<\/a><\/p>\n You can also remove the -nv so that you can see what is happening. To get the full range of options type “wget --help”<\/p>","upvoteCount":1,"datePublished":"2022-12-27T08:06:05.000Z","url":"https://community.spiceworks.com/t/issue-with-wget-for-crawling-and-scraping/942993/2","author":{"@type":"Person","name":"peterw2300","url":"https://community.spiceworks.com/u/peterw2300"}},{"@type":"Answer","text":" -r should be -R if you want recursive search<\/p>","upvoteCount":1,"datePublished":"2022-12-27T12:32:16.000Z","url":"https://community.spiceworks.com/t/issue-with-wget-for-crawling-and-scraping/942993/3","author":{"@type":"Person","name":"jessevas","url":"https://community.spiceworks.com/u/jessevas"}},{"@type":"Answer","text":"