Python爬虫如何抓取需要登录的网站_Python爬虫模拟登录后抓取受限内容方法-Python教程-PHP中文网

Python爬虫如何抓取需要登录的网站_Python爬虫模拟登录后抓取受限内容方法

雪夜

发布： 2025-11-07 14:29:17

原创

495人浏览过

首先使用requests或Selenium模拟登录并维持会话，1.通过Session获取csrf token并提交登录表单，2.对JS渲染页面用Selenium操作浏览器登录并注入cookies，3.后续请求复用同一Session对象访问受保护内容。

python爬虫如何抓取需要登录的网站_python爬虫模拟登录后抓取受限内容方法

抓取需要登录的网站，关键在于模拟登录过程并维持会话状态。Python爬虫通过携带有效的身份凭证（如 cookies 或 token）请求受限页面，从而获取受保护内容。下面介绍常用方法和实现步骤。

使用 requests + BeautifulSoup 模拟登录

大多数登录网站使用表单提交用户名和密码。通过分析登录接口，用 requests 发送 POST 请求，并保存返回的 cookies，后续请求即可携带这些凭证。

基本流程如下：

访问登录页面，获取隐藏字段（如 csrf token）
构造登录数据，包含用户名、密码和必要隐藏参数
发送 POST 请求到登录接口
检查是否登录成功（可通过跳转或响应内容判断）
使用同一个 session 对象请求其他受保护页面

示例代码：

import requests
from bs4 import BeautifulSoup
<p>session = requests.Session()</p><p><span>立即学习</span>“<a href="https://pan.quark.cn/s/00968c3c2c15" style="text-decoration: underline !important; color: blue; font-weight: bolder;" rel="nofollow" target="_blank">Python免费学习笔记（深入）</a>”；</p><h1>第一步：获取登录页和 csrf token</h1><p>login_url = '<a href="https://www.php.cn/link/d9976f1c2c0c972d1cee0c3647cbd194">https://www.php.cn/link/d9976f1c2c0c972d1cee0c3647cbd194</a>'
res = session.get(login_url)
soup = BeautifulSoup(res.text, 'html.parser')
csrf_token = soup.find('input', {'name': 'csrf'})['value']</p><h1>第二步：提交登录表单</h1><p>login_data = {
'username': 'your_username',
'password': 'your_password',
'csrf': csrf_token
}
session.post(login_url, data=login_data)</p><h1>第三步：访问受限页面</h1><p>protected_page = session.get('<a href="https://www.php.cn/link/fad68ee497f1cf9108b630e7ce630e6c">https://www.php.cn/link/fad68ee497f1cf9108b630e7ce630e6c</a>')
print(protected_page.text)

登录后复制

处理 JavaScript 渲染的登录（使用 Selenium）

有些网站前端由 JavaScript 动态渲染，表单提交通过 Ajax，且登录状态依赖本地存储（如 localStorage）。这种情况下，requests 难以直接模拟。推荐使用 Selenium 驱动真实浏览器操作。

主要优势：

百度虚拟主播

百度智能云平台的一站式、灵活化的虚拟主播直播解决方案

查看详情

自动执行 JS，加载动态内容
支持点击、输入、等待等用户行为
可获取登录后生成的 cookies

示例代码：

from selenium import webdriver
import time
<p>driver = webdriver.Chrome()
driver.get('<a href="https://www.php.cn/link/d9976f1c2c0c972d1cee0c3647cbd194">https://www.php.cn/link/d9976f1c2c0c972d1cee0c3647cbd194</a>')</p><h1>填写表单并提交</h1><p>driver.find_element_by_name('username').send_keys('your_username')
driver.find_element_by_name('password').send_keys('your_password')
driver.find_element_by_tag_name('form').submit()</p><p>time.sleep(3)  # 等待登录完成</p><h1>将 cookies 注入 requests session</h1><p>session = requests.Session()
for cookie in driver.get_cookies():
session.cookies.set(cookie['name'], cookie['value'])</p><h1>后续可用 session 抓取内容</h1><p>res = session.get('<a href="https://www.php.cn/link/6499e19d47d7cbd3302a26fdb40d0b41">https://www.php.cn/link/6499e19d47d7cbd3302a26fdb40d0b41</a>')
print(res.text)</p><p>driver.quit()

登录后复制