To begin to familiarize yourself with how this web page is set up, you can take a look at its DOMwhich will help you understand how the HTML is structured. The Requests library allows you to make use of HTTP within your Python programs in a human readable way, and the Beautiful Soup module is designed to get web scraping done quickly.

We will import both Requests and Beautiful Soup with the import statement. Because the URL is lengthy, the code above and throughout this tutorial will not pass PEP 8 E which flags lines longer than 79 characters.

You may want to assign the URL to a variable to make the code more readable in final versions. The code in this tutorial is for demonstration purposes and will allow you to swap out shorter URLs as part of your own projects. This object takes as its arguments the page. Whatever data you would like to collect, you need to find out how it is described by the DOM of the web page.

Within the context menu that pops up, you should see a menu item similar to Inspect Element Firefox or Inspect Chrome. Once you click on the relevant Inspect menu item, the tools for web developers should appear within your browser. This is important to note so that we only search for text within this section of the web page.

We also notice that the name Zabaglia, Niccola is in a link tag, since the name references a web page that describes the artist. We can therefore use Beautiful Soup to find the AlphaNav class and use the decompose method to remove a tag from the parse tree and then destroy it along with its contents.

We can run the program with the python command to view the following output: However, what if we want to also capture the URLs associated with those artists? From the output of the links above, we know that the entire URL is not being captured, so we will concatenate the link string with the front of the URL string in this case https: Output Zabaglia, Niccola https: Comma-separated values CSV files allow us to store tabular data in plain text, and is a common format for spreadsheets and databases.

Before beginning with this section, you should familiarize yourself with how to handle plain text files in Python.

Instead, a file will be created in the directory you are working in called z-artist-names.

Depending on what you use to open it, it may look something like this: Or, it may look more like a spreadsheet: Retrieving Related Pages We have created a program that will pull data from the first page of the list of artists whose last names start with the letter Z.

However, there are 4 pages in total of these artists available on the website.

