0%

观察网站

查看源代码只有网站外观设计后,发现F12抓包后名为[practice]的文件内有“label”和”questionBh”两个题目号
练习题目首页
点进题目后再点击查看答案,抓包的文件”data”中就有完整的答案
![点进一道题目,并点击参考答案]/images/(3.png)
点开第二次抓包的标头,发现url是http://[一些域名]={questionBh}
点开第二次抓包的标头
我们可以通过 label -> questionBh -> answer

写代码

先进行 label -> questionBh 的步骤

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import requests
import re
import csv

url = f'http://[一些域名]/api/exam/question/practice'

headers = {
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 Edg/130.0.0.0"
}
resp = requests.get(url,headers=headers)

obj = re.compile(r'"label":"((?P<num>.*?))(?P<none>.*?)questionBh":"(?P<web>.*?)"',re.S)
result = obj.finditer(resp.text)

for i in result:
print(i.group("num","web"))

resp.close()

再对得到的格式进行一些调整,并变成csv格式,示例如下

1
2
3
num,questionBh
5,02ac8e3a76404d668bea36a31c90fc7b
*,********************************

再进行 questionBh -> answer 的步骤

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import requests
import re
import csv


query= input("输入你想搜索的东西\n")
url = f'http://[一些域名]/api/exam/question/getAnswer?questionBh={query}&studentId=2024011666'

headers = {
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 Edg/130.0.0.0"
}
resp = requests.get(url,headers=headers)

obj = re.compile(r'"data":"(?P<ans>.*?)"}',re.S)
result = obj.finditer(resp.text)


整合代码,通过csv文件串联起来

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import csv
import requests
import re


def load_mapping(file_path):
mapping = {}
with open(file_path, mode='r', encoding='utf-16be') as file:
reader = csv.reader(file)
next(reader)
for row in reader:
num, questionBh = row
mapping[num] = questionBh
return mapping

def get_question_bh(num, mapping):
return mapping.get(num, "Not Found")


file_path = 'mapping.csv'

k = input("输入题号:")


mapping = load_mapping(file_path)

question_bh = get_question_bh(k, mapping)
print(f"生成的 questionBh: {question_bh}")

k = f"{question_bh}"


url = f'http://[一些域名]/api/exam/question/getAnswer?questionBh={k}'

headers = {
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 Edg/130.0.0.0"
}
resp = requests.get(url,headers=headers)

obj = re.compile(r'"data":"(?P<ans>.*?)"}',re.S)
result = obj.finditer(resp.text)

使用 ast 解析转义字符

笔者在学习中发现,使用 ast.literal_eval() 解析转义字符可以消除多余字符的影响,省去了复制到ai转换的麻烦

1
2
3
4
5
6
7
8
9
10
11
ans_list = [i.group("ans") for i in result]

# 将列表中的所有内容连接成一个字符串
a = ''.join(ans_list)

# 使用 ast.literal_eval() 安全地解析转义字符
escaped_text = ast.literal_eval(f'"""{a}"""')
print(escaped_text)


resp.close()

完整代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import csv
import requests
import re
import ast

def load_mapping(file_path):
mapping = {}
with open(file_path, mode='r', encoding='utf-16be') as file:
reader = csv.reader(file)
next(reader) # 跳过标题行
for row in reader:
num, questionBh = row
mapping[num] = questionBh
return mapping

def get_question_bh(num, mapping):
return mapping.get(num, "Not Found")

# 示例输入
file_path = 'mapping.csv'

k = input("输入题号:")

# 加载对应关系
mapping = load_mapping(file_path)

# 获取对应的 questionBh
question_bh = get_question_bh(k, mapping)
print(f"生成的 questionBh: {question_bh}")

k = f"{question_bh}"

url = f'http://[一些域名]//api/exam/question/getAnswer?questionBh={k}'

headers = {
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 Edg/130.0.0.0"
}
resp = requests.get(url,headers=headers)

obj = re.compile(r'"data":"(?P<ans>.*?)"}',re.S)
result = obj.finditer(resp.text)

ans_list = [i.group("ans") for i in result]

# 将列表中的所有内容连接成一个字符串
a = ''.join(ans_list)

# 使用 ast.literal_eval() 安全地解析转义字符
escaped_text = ast.literal_eval(f'"""{a}"""')
print(escaped_text)

resp.close()

结果

生成结果如下图,经过修改后可以直接复制到提交窗口使用啦
生成结果

参考

[bilibili]Python爬虫教程 通过视频,我学习了抓包方法,re,正则表达式,最简单的通信原理
在线正则表达式测试

get

import requests
query= input(“输入你想搜索的东西\n”)
url = f’https://www.sogou.com/web?query={query}

headers = {
“user-agent”:”Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 Edg/130.0.0.0”
}
resp = requests.get(url,headers=headers)

print(resp)
print(resp.text)
resp.close()

post

import requests

url = “https://fanyi.baidu.com/sug

word=input(“输入想翻译的单词”)
dat={
“kw”:word
}

resp = requests.post(url,data=dat)
print(resp.json())
resp.close()

re

import re
import csv

findall 很小的文件可以用

lst=re.findall(r”\d+”,”my10086”)
print(lst)

finditer 迭代器?最常用

it=re.finditer(r”\d+”,”my10086”)
for i in it:
print(i.group())

search只找到一个结果就返回

#search返回是match对象,拿数据要.group()
lst=re.search(r”\d+”,”my10086”)
print(lst.group())

match必须从头匹配

s = re.match(r”\d+”,”my10086”)
print(s.group())

预加载正则表达式

obj = re.compile(r”\d+”)
it = obj.finditer(“1234”)
for i in it:
print(i.group())

it = obj.finditer(“5678”)
for i in it:
print(i.group())

查找限定组

obj = re.compile(“xxx.?xxx(?P.?)”,re.S)
result = obj.finditer(s)
#取所有组
for i in result:
print(i.group())
#取某个组 #(?P正则)
for i in result:
print(i.group(“name”))

感谢在搭建blog时参考的文章

(zhihu)Hexo+Next主题搭建个人博客+优化全过程
(个人)hexo 博客安装教程 胎教级
(github)live2d models
(个人)Hexo NexT 魔改系列之二 ── 搜索篇
(个人)一次完整的Hexo写作流程
(贴吧)本地访问正常但部署到GitHub上访问就出问题
(cnds)Hexo 搭建个人博客(八)NexT 侧边栏配置
(cnds)Hexo-Next 主题博客个性化配置超详细,超全面(两万字)
(个人)一次完整的Hexo写作流程
(个人)Hexo NexT 魔改系列之二 ── 搜索篇
(个人)Hexo+Next 搭建个人博客 (添加网页音乐播放器)

今后的想完成的美化

暂无