有些网页内容,需要登录才能访问。

模拟登录有好几种方法。

1. POST请求登录

1.1 找到URL、Content-Type、表单数据

(1)找到POST请求的URL

打开登录页面,打开Chrome 检查(右击网页空白处,点击【检查】),输入账号和密码,登录。

找到POST请求的URL,登录通常名称带有login,如下图所示:

post login

(2)找到Content-Type

在请求标头处找到Content-Type,Content-Type指明了客户端向服务端发送了什么类型的数据。4种常见的POST内容类型是,

  • application/x-www-form-urlencoded,Content-Type的默认值,提交的数据按照key1=val1&key2=val2&... 进行编码,keyval都进行了URL转码
  • application/json,告诉服务端消息主体是序列化后的JSON字符串
  • text/xml
  • multipart/form-data,通常用来上传文件
Content-Type: application/x-www-form-urlencoded
Content-Type: application/json;charset=UTF-8
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryrGKCBY7qhFd3TrwA

(3)找到表单数据

表单提交的数据在上图【载荷】上查看,比如:

userName=xxx&password=xxx&verifyCode=

表单提交的数据有用户名、密码(这里看到的密码多半是已经加密过的密码)和验证码。

1.2 获取验证码

上述表单提交需要验证码,

image-20230516105005753

思路:通过Get请求,获取验证码图片,再通过文字识别,识别出图片上的字符。

(1)获取验证码图片

代码很简单,

url = 'https://b.leyaoyao.com/lyy/rest/group/distributor/getLoginVerifyCode'

session = requests.Session()
r = session.get(url)

with open('verify_code_example.jpg', 'wb') as f:
    f.write(r.content)

verify_code_example

(2)识别图片上的文字

文字识别库,目前有两款主流的开源框架Tesseract和EasyOCR,简单看了下,更推荐Tesseract。

1.2.1 Tesseract

安装pytesseract

sudo apt install libtesseract-dev tesseract-ocr
pip3 install pytesseract

识别图片上的文字,

import pytesseract
from PIL import Image

text = pytesseract.image_to_string(Image.open('verify_code_example.jpg'))

遗憾的是,pytesseract不能识别上述的验证码,返回文本为空。

1.2.2 EasyOCR

安装EasyOCR

pip3 install easyocr

使用easyocr,支持的语言列表见Supported Languages

import easyocr

reader = easyocr.Reader(['ch_sim', 'en'], gpu=False) 
text = reader.readtext('verify_code_example.jpg')

运行结果举例如下:

# python3 tmp.py
Using CPU. Note: This module is much faster with a GPU.
[([[0, 0], [55, 0], [55, 18], [0, 18]], 'FGSJU', 0.8078654978535195)]

reader.readtext返回坐标、文本、置信度。

如果提示如下信息,表明需要下载识别模型,模型存放在home目录下.EasyOCR/model

Using CPU. Note: This module is much faster with a GPU.
Downloading recognition model, please wait. This may take several minutes depending upon your network connection.

注:第一次使用需要下载检测模型,如果下载很慢,可以从Jaided AI: EasyOCR model hub手动下载,解压后放到home目录下/.EasyOCR/model/子目录下。

Jaided AI: EasyOCR documentation

1.3 模拟登录

知道了URL和数据,就可以使用Requests模块发送POST请求。模拟登录后用session保持登录状态。

session = requests.Session()

def login():
    url = 'https://b.leyaoyao.com/lyy/rest/group/distributor/login'
    param = {
        'userName': '',
        'password': ''
    }

    # Content-Type: application/x-www-form-urlencoded
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36',
    }

    response_login = session.post(url, headers=headers, data=param)

    return response_login.status_code

Content-Type如果是application/json;charset=UTF-8,调用json.dumps将数据序列化成JSON字符串json.dumps(param)

import json 

# 得到场地收益
url = 'https://b.leyaoyao.com/lyy/rest/income/benefit/group'

# parameters
param = {
    "startDate": "2022-08-17",
    "endDate": "2022-08-17",
    "pageIndex": 1,
    "pageSize": 10,
    "field": "all_amount",
    "sortDirection": "desc",
    "labels": []
}

response = session.post(url, data=json.dumps(param), headers=headers)

同时在header指定'Content-type':'application/json'

headers = {
             'User-Agent': 'Mozilla/5.0 (Linux; Android 9.0; SAMSUNG SM-F900U Build/PPR1.180610.011) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0    .0 Mobile Safari/537.36',
             'Content-type':'application/json'
         }

1.4 返回结果处理

在预览中可以查看POST请求后的返回结果,

chrome-inspect-preview

如果是json格式,可以用response.json()返回一个JSON对象,

result = response.json()

举例,以乐摇摇为例,查看详细收益,

https://b.leyaoyao.com/lyy/rest/income/benefit/group

param = {
  "startDate": "2022-08-17",
  "endDate": "2022-08-17",
  "pageIndex": 1,
  "pageSize": 10,
  "field": "all_amount",
  "sortDirection": "desc",
  "labels": []
}

# response.json
{"result":0,"description":"","data":{"page":1,"total":1,"pageSize":10,"maxPage":1,"times":0,"items":[{"amount":1100.00,"onlineAmount":0.00,"offlineAmount":559.00,"adAmount":0.00,"giftAmount":324.00,"gameAmount":0.00,"gameGiftAmount":null,"rdGiftAmount":0.00,"adCount":0,"coins":977,"onlineCoins":101,"offlineCoins":876,"actualCoins":2003,"giftQuantity":27,"gameGiftQuantity":0,"rdGiftQuantity":null,"redCoins":0,"payCoins":1232,"wechatPayCount":null,"alipayPayCount":null,"unionPayCount":null,"jdPayCount":null,"payTotalCount":295,"giftConsumptionWeight":0,"customServiceFee":0.00,"valueAddedServiceFee":0.00,"groupId":1145312,"groupName":"吴川","equipmentCount":43,"module":"4,5,1,2,3,6,","displaySortBenefit":"Y","isactive":"Y"}],"offset":0}}
本文系Spark & Shine原创,转载需注明出处本文最近一次修改时间 2023-10-14 15:11

results matching ""

    No results matching ""