您当前的位置: 首页 >  Peter_Gao_ ide

pyspider 安装启动

Peter_Gao_ 发布时间:2019-11-28 09:06:24 ,浏览量:4

win10 +  python3.8 + idea

在命令行中输入安装 : pip3 install pyspider

如果没有报错,可以启动:pyspider all

在浏览器输入http://localhost:5000/(这里的5000是和下方图片中红框的数字要一致),如果看到以下界面,说明启动成功了!

 

过程遇到两次报错

1. 轮子错误

在windows系统常会出现如下问题

Command "python setup.py egg_info" failed with error code 10 in

解决方法:

  1)安装依赖包:pip3 install wheel

    2)到  https://www.lfd.uci.edu/~gohlke/pythonlibs/ 下载另一个依赖pycurl,安装

进入www.lfd.uci.edu/~gohlke/pythonlibs/,Ctrl + F查找 pycurl



 这个包名是pycurl-版本-你下载的python版本(如python3.4,就是cp34)-win32/64操作系统),选择你所需要的进行下载


3): 安装编译包,命令行输入 pip install 你下载的whl文件的位置如(d:\pycurl-7.43.1-cp34-cp34m-win_amd64.whl)

pip3 install F:\各种浏览器下载\谷歌浏览器下载\pycurl-7.43.0.3-cp37-cp37m-win_amd64.whl


S5: 继续装  pip install pyspider

 

2. 报错

async在3.7中是关键字不能作为参数了


[root@localhost python]# pyspider all
[W 180629 07:08:26 run:413] phantomjs not found, continue running without it.
[I 180629 07:08:29 result_worker:49] result_worker starting...
[I 180629 07:08:31 processor:211] processor starting...
[I 180629 07:08:31 tornado_fetcher:638] fetcher starting...
[I 180629 07:08:31 scheduler:675] scheduler starting...
[I 180629 07:08:31 scheduler:614] in 5m: new:0,success:0,retry:0,failed:0
[I 180629 07:08:31 scheduler:810] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 180629 07:08:32 app:84] webui exiting...
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.6.5/bin/pyspider", line 11, in
    load_entry_point('pyspider==0.3.10', 'console_scripts', 'pyspider')()
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/run.py", line 754, in main
    cli()
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/run.py", line 497, in all
    ctx.invoke(webui, **webui_config)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/run.py", line 384, in webui
    app.run(host=host, port=port)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/webui/app.py", line 59, in run
    from .webdav import dav_app
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/webui/webdav.py", line 216, in
    dav_app = WsgiDAVApp(config)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/WsgiDAV-3.0.0a2-py3.6.egg/wsgidav/wsgidav_app.py", line 122, in __init__
    _check_config(config)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/WsgiDAV-3.0.0a2-py3.6.egg/wsgidav/wsgidav_app.py", line 104, in _check_config
    raise ValueError("Invalid configuration:\n  - " + "\n  - ".join(errors))
ValueError: Invalid configuration:
  - Deprecated option 'dir_browser.enable': use 'middleware_stack' instead.
  - Deprecated option 'domaincontroller': use 'domain_controller' instead.

ImportError: cannot import name 'CurlasyncHTTPClient' from 'tornado.curl_httpclient' (/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tornado/curl_httpclient.py)
 


两种解决方案
1、替换关键字。 
2、降低python版本

选择第一种替换一下关键字。


分别在run.py、 tornado_fetcher.py、 webui>app.py, ctrl+f 查找async替换成其它单词比如说shark 就可以了。

批量替换注意勾选 全字符匹配words 和 大小写匹配 match case.不要替换了多余的async导致的.
async 分别在
1 ...site-packages/pyspider/run.py
2 .../site-packages/pyspider/fetcher/tornado_fetcher.py
将async替换成 shark

继续运行 pyspider all
ValueError: Invalid configuration: - Deprecated option 'domaincontroller': use 'http_authenticator

在安装包中找到pyspider的资源包,然后找到webui文件里面的webdav.py文件打开,修改第209行即可。

将'domaincontroller': NeedAuthController(app),
改为
'http_authenticator':{
        'HTTPAuthenticator':NeedAuthController(app),
},

注意大括号结尾后面跟着个逗号,少了这个逗号害的排查了一下午。

在安装包中找到pyspider的资源包,然后找到webui文件里面的app.py文件打开,修改第95行即可。

    'fetch': lambda x: tornado_fetcher.Fetcher(None, None, async=False).fetch(x),
改为
    'fetch': lambda x: tornado_fetcher.Fetcher(None, None,  shak=False).fetch(x),
 

 继续启动:

启动 pyspider 的所有组件,包括 PhantomJS、ResultWorker、Processer、Fetcher、Scheduler、WebUI,这些都是 pysipder 运行必备的组件。最后一行输出 WebUI 运行在 5000 端口上。可以打开浏览器,输入链接 http://localhost:5000,这时我们会看到启动页面。


总结: 

 这是python3.8,先导入keyword这个包,然后可以获得这样一个列表,这里面的都是不可以用作参数的特殊字符,当然变量名也是不可以使用的。

 

 

 

关注
打赏
查看更多评论

Peter_Gao_

暂无认证

  • 4浏览

    0关注

    534博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文
立即登录/注册

微信扫码登录