您当前的位置: 首页 >  彭世瑜 Python

python爬虫用到的工具和类库

彭世瑜 发布时间:2018-04-23 00:40:58 ,浏览量:3

需要安装的工具和库

开发工具

python https://www.python.org/
pycharm https://www.jetbrains.com/pycharm/
可以直接去官网下载安装

内置基本库

urllib re

>>> from urllib.request import urlopen
>>> response = urlopen("http://www.baidu.com")
>>> response

网络请求库

requests http://cn.python-requests.org/zh_CN/latest/

>>> import requests
>>> response = requests.get("http://www.baidu.com")
>>> response

浏览器工具

selenium https://www.seleniumhq.org/

chromedriver
google官网:https://sites.google.com/a/chromium.org/chromedriver/downloads
淘宝镜像:https://npm.taobao.org/mirrors/chromedriver/

>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get("http://www.baidu.com")
>>> driver.get("https://www.python.org")
>>> html = driver.page_source

phantomjs http://phantomjs.org/

>>> from selenium import webdriver
>>> dirver = webdriver.PhantomJS()
>>> dirver.get("http://www.baidu.com")
>>> html = driver.page_source

网页解析库

lxml http://lxml.de/
beautifulsoup4 https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

>>> from bs4 import BeautifulSoup as BS
>>> html = ""
>>> soup = BS(html, "lxml")
>>> soup.h1

pyquery https://pythonhosted.org/pyquery/

>>> from pyquery import PyQuery as pq
>>> html = "

title

" >>> doc = pq(html) >>> doc("html").text() 'title' >>> doc("h1").text() 'title'

数据库

mysql https://dev.mysql.com/downloads/mysql/
redis https://redis.io/
mongobd https://www.mongodb.com/
mac os 可以使用 brew 安装 https://docs.brew.sh/

数据库包:

pymysql

>>> import pymysql  https://pypi.org/project/PyMySQL/

>>> conn = pymysql.connect(host="localhost", 
    user="root", password="123456", 
    port=3306, db="demo")
>>> cursor = conn.cursor()

>>> sql = "select * from mytable"
>>> cursor.execute(sql)
3
>>> cursor.fetchone()
(1, datetime.date(2018, 4, 14))

>>> cursor.close()
>>> conn.close()

pymongo http://api.mongodb.com/python/current/index.html

>>> import pymongo

>>> client = pymongo.MongoClient("localhost")
>>> db = client["newtestdb"]

>>> db["table"].insert({"name": "Tom"})
ObjectId('5adcb250d7696c839a251658')

>>> db["table"].find_one({"name": "Tom"})
{'_id': ObjectId('5adcb250d7696c839a251658'), 'name': 'Tom'}

redis

>>> import redis

>>> r = redis.Redis("localhost", 6379)

>>> r.set("name", "Tom")
True
>>> r.get("name")
b'Tom'

web框架包:
flask http://docs.jinkan.org/docs/flask/
django https://www.djangoproject.com/
jupyter http://jupyter.org/
运行:jupyter notebook
快捷键 增加一行:b

一条命令安装以上所有库

pip install requests selenium beautifulsoup4 pyquery pymysql pymongo redis flask django jupyter
关注
打赏
查看更多评论

彭世瑜

暂无认证

  • 3浏览

    0关注

    2727博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文
立即登录/注册

微信扫码登录