Home > Notes > Web Scrapping and BeautiFul Soup

BeautiFul Soup search()

Web Scrapping and BeautiFul Soup

Python Request_library Explanation
BeautiFul Soup search()
BeautiFul Soup Introduction and Explanation
BeautiFul Soup find_all()
BeautiFul Soup find()
BeautiFul Soup search()

BeautiFul Soup search()

Beautiful Soup searching problem

Find Elements by Class Name and Text Content Not all of the job listings are developer jobs. Instead of printing out all the jobs listed on the website, you’ll first filter them using keywords. You know that job titles in the page are kept within

elements. To filter for only specific jobs , you can use the string argument

import requests
 from bs4 import BeautifulSoup
 URL= "https://realpython.github.io/fake-jobs/"
 page = requests.get(URL)
 soup = BeautifulSoup(page.content,"html.parser")
 results = soup.find(id="ResultsContainer")
 print(results)

Beautiful Soup searching problem

import requests
 from bs4 import BeautifulSoup
 URL= "https://realpython.github.io/fake-jobs/"
 page = requests.get(URL)
 soup = BeautifulSoup(page.content,"html.parser")
 results = soup.find(id="ResultsContainer")
 print(results)

Find Elements by Class Name and Text Content

Not all of the job listings are developer jobs. Instead of printing out all the jobs listed on the website, you’ll first filter them using keywords. You know that job titles in the page are kept within

elements. To filter for only specific jobs , you can use the string argument:

 python_jobs = results.find_all("h2", string="Python")
'''
 This code finds all <h2> elements where the contained string matches “Python” exactly. N
 ote that you’re directly calling the method on your first results variable.
If you go ahead and print() the output of the above code snippet to your console,
 then you might be disappointed because it’ll be empty:
'''
 print(python_jobs)
'''
 There was a Python job in the search results, so why is it not showing up?
 When you use string= as you did above,
 your program looks for that string exactly.
 Any differences in the spelling, capitalization,
 or whitespace will prevent the element from matching.
 In the next section, you’ll find a way to make your search string more general.
'''

Pass a Function to a Beautiful Soup Method

Now you’re passing an anonymous function to the string= argument. The lambda function looks at the text of each

element, converts it to lowercase, and checks whether the substring “python” is found anywhere. You can check whether you managed to identify all the Python jobs with this approach:

import requests
 from bs4 import BeautifulSoup
 URL= "https://realpython.github.io/fake-jobs/"
 page = requests.get(URL)
 soup = BeautifulSoup(page.content,"html.parser")
 results = soup.find(id="ResultsContainer")
 python_jobs = results.find_all(
    "h2", string=lambda text: "python" in text.lower()
 )
 print(python_jobs)