Web Scraping Company Press Release + (Beginner) Text Analysis with Python

Photo by Cookie the Pom on Unsplash[1][2]

Company Press Releases can be an important source of information when making investment decisions. There is a wealth of information online so learning web scraping can be very useful and highly applicable in several fields for research purposes.

  • Simple Text Analysis with NLTK

Requests and Beautiful Soup are the go-to modules for a simple web scraping program. The requests module is used to download files and web pages while Beautiful Soup parses HTML.

# import required modules
import requests
import bs4
# get Response object using requests.get()
pr_res = requests.get('https://investors.etsy.com/press-releases/press-release-details/2020/Etsy-to-Announce-Third-Quarter-2020-Financial-Results-on-October-28-2020/default.aspx'[3])
# create Soup object to parse HTML
pr_soup = bs4.BeautifulSoup(pr_res.text, 'html.parser')
# only get text content inside <p> sections
pr_text = pr_soup.select('.xn-content p')
# consolidate the text
pr_final = []
for i, body in enumerate(pr_text):
pr_final.append(pr_text[i].getText())# convert list to a string
pr_string = ' '.join(pr_final)

1 2 3 4