Codied To Clipboard !
Home > Notes > Web Scrapping and BeautiFul Soup
Find Elements by Class Name and Text Content Not all of the job listings are developer jobs. Instead of printing out all the jobs listed on the website, you’ll first filter them using keywords. You know that job titles in the page are kept within
import requests from bs4 import BeautifulSoup URL= "https://realpython.github.io/fake-jobs/" page = requests.get(URL) soup = BeautifulSoup(page.content,"html.parser") results = soup.find(id="ResultsContainer") print(results)
import requests from bs4 import BeautifulSoup URL= "https://realpython.github.io/fake-jobs/" page = requests.get(URL) soup = BeautifulSoup(page.content,"html.parser") results = soup.find(id="ResultsContainer") print(results)
Not all of the job listings are developer jobs. Instead of printing out all the jobs listed on the website, you’ll first filter them using keywords. You know that job titles in the page are kept within
python_jobs = results.find_all("h2", string="Python") ''' This code finds all <h2> elements where the contained string matches “Python” exactly. N ote that you’re directly calling the method on your first results variable. If you go ahead and print() the output of the above code snippet to your console, then you might be disappointed because it’ll be empty: ''' print(python_jobs) ''' There was a Python job in the search results, so why is it not showing up? When you use string= as you did above, your program looks for that string exactly. Any differences in the spelling, capitalization, or whitespace will prevent the element from matching. In the next section, you’ll find a way to make your search string more general. '''
Now you’re passing an anonymous function to the string= argument. The lambda function looks at the text of each
import requests from bs4 import BeautifulSoup URL= "https://realpython.github.io/fake-jobs/" page = requests.get(URL) soup = BeautifulSoup(page.content,"html.parser") results = soup.find(id="ResultsContainer") python_jobs = results.find_all( "h2", string=lambda text: "python" in text.lower() ) print(python_jobs)