![]() ![]() Line 83: we define our download_images function that takes in our big list of image URLs and the original URL we are interested in. ![]() Lines 72-74: if we don’t already have the image URL (72) we print out a message (73) and then add the image URL to our list of all images found (74).Īlright! Now that we have extracted all of the image URLs that we can, we need to download them and process them for EXIF data.Lines 63-70: we walk over the list of IMG tags found (64) and we build URLs (67-70) to the images that we can use later to retrieve the images themselves.This will produce a list of all IMG tags discovered in the HTML. The parsing is handled by using the findAll function (60) and passing in the img tag. Lines 59-60: now that we have the HTML we hand it off to BeautifulSoup (59) so that we can begin parsing the HTML for image tags.We print out a little helper message (53) and then we retrieve the HTML using the fetch function (56). Lines 48-56: we walk through the list of assets (48), and then use the get_archive_url function (51) to hand us a useable URL.Line 43: we setup our get_image_paths function to receive the Pack object. ![]() Now crack open a new Python file, call it waybackimages.py (download the source here) and start pounding out (use both hands) the following: Pip install bs4 requests pandas pyexifinfo waybackpackĪlright let’s get down to it shall we? Coding It Up Now we are ready to install the various Python libraries that we need: Installing The Necessary Python Libraries Don’t know how to do this? Google will help.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |