add a func called readPdf in toolbox, which can read pdf paper to str. then use bs4.BeautifulSoup to clean content.