Merge branch 'master' of https://github.com/OverKit/gpt_academic into OverKit-master

This commit is contained in:
505030475 2023-06-19 14:49:50 +10:00
commit dca9ec4bae
7 changed files with 173 additions and 66 deletions

View File

@ -16,7 +16,7 @@ To translate this project to arbitary language with GPT, read and run [`multi_la
>
> 1.请注意只有**红颜色**标识的函数插件(按钮)才支持读取文件,部分插件位于插件区的**下拉菜单**中。另外我们以**最高优先级**欢迎和处理任何新插件的PR
>
> 2.本项目中每个文件的功能都在自译解[`self_analysis.md`](https://github.com/binary-husky/chatgpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)详细说明。随着版本的迭代您也可以随时自行点击相关函数插件调用GPT重新生成项目的自我解析报告。常见问题汇总在[`wiki`](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98)当中。[安装方法](#installation)。
> 2.本项目中每个文件的功能都在自译解[`self_analysis.md`](https://github.com/binary-husky/gpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)详细说明。随着版本的迭代您也可以随时自行点击相关函数插件调用GPT重新生成项目的自我解析报告。常见问题汇总在[`wiki`](https://github.com/binary-husky/gpt_academic/wiki/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98)当中。[安装方法](#installation)。
>
> 3.本项目兼容并鼓励尝试国产大语言模型chatglm和RWKV, 盘古等等。支持多个api-key共存可在配置文件中填写如`API_KEY="openai-key1,openai-key2,api2d-key3"`。需要临时更换`API_KEY`时,在输入区输入临时的`API_KEY`然后回车键提交后即可生效。
@ -31,13 +31,13 @@ To translate this project to arbitary language with GPT, read and run [`multi_la
一键中英互译 | 一键中英互译
一键代码解释 | 显示代码、解释代码、生成代码、给代码加注释
[自定义快捷键](https://www.bilibili.com/video/BV14s4y1E7jN) | 支持自定义快捷键
模块化设计 | 支持自定义强大的[函数插件](https://github.com/binary-husky/chatgpt_academic/tree/master/crazy_functions),插件支持[热更新](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)
[自我程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] [一键读懂](https://github.com/binary-husky/chatgpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)本项目的源代码
模块化设计 | 支持自定义强大的[函数插件](https://github.com/binary-husky/gpt_academic/tree/master/crazy_functions),插件支持[热更新](https://github.com/binary-husky/gpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)
[自我程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] [一键读懂](https://github.com/binary-husky/gpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)本项目的源代码
[程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] 一键可以剖析其他Python/C/C++/Java/Lua/...项目树
读论文、[翻译](https://www.bilibili.com/video/BV1KT411x7Wn)论文 | [函数插件] 一键解读latex/pdf论文全文并生成摘要
Latex全文[翻译](https://www.bilibili.com/video/BV1nk4y1Y7Js/)、[润色](https://www.bilibili.com/video/BV1FT411H7c5/) | [函数插件] 一键翻译或润色latex论文
批量注释生成 | [函数插件] 一键批量生成函数注释
Markdown[中英互译](https://www.bilibili.com/video/BV1yo4y157jV/) | [函数插件] 看到上面5种语言的[README](https://github.com/binary-husky/chatgpt_academic/blob/master/docs/README_EN.md)了吗?
Markdown[中英互译](https://www.bilibili.com/video/BV1yo4y157jV/) | [函数插件] 看到上面5种语言的[README](https://github.com/binary-husky/gpt_academic/blob/master/docs/README_EN.md)了吗?
chat分析报告生成 | [函数插件] 运行后自动生成总结汇报
[PDF论文全文翻译功能](https://www.bilibili.com/video/BV1KT411x7Wn) | [函数插件] PDF论文提取题目&摘要+翻译全文(多线程)
[Arxiv小助手](https://www.bilibili.com/video/BV1LM4y1279X) | [函数插件] 输入arxiv文章url即可一键翻译摘要+下载PDF
@ -46,8 +46,8 @@ chat分析报告生成 | [函数插件] 运行后自动生成总结汇报
⭐Arxiv论文精细翻译 | [函数插件] 一键[以超高质量翻译arxiv论文](https://www.bilibili.com/video/BV1dz4y1v77A/),迄今为止最好的论文翻译工具⭐
公式/图片/表格显示 | 可以同时显示公式的[tex形式和渲染形式](https://user-images.githubusercontent.com/96192199/230598842-1d7fcddd-815d-40ee-af60-baf488a199df.png),支持公式、代码高亮
多线程函数插件支持 | 支持多线调用chatgpt一键处理[海量文本](https://www.bilibili.com/video/BV1FT411H7c5/)或程序
启动暗色gradio[主题](https://github.com/binary-husky/chatgpt_academic/issues/173) | 在浏览器url后面添加```/?__theme=dark```可以切换dark主题
[多LLM模型](https://www.bilibili.com/video/BV1wT411p7yf)支持[API2D](https://api2d.com/)接口支持 | 同时被GPT3.5、GPT4、[清华ChatGLM](https://github.com/THUDM/ChatGLM-6B)、[复旦MOSS](https://github.com/OpenLMLab/MOSS)同时伺候的感觉一定会很不错吧?
启动暗色gradio[主题](https://github.com/binary-husky/gpt_academic/issues/173) | 在浏览器url后面添加```/?__theme=dark```可以切换dark主题
[多LLM模型](https://www.bilibili.com/video/BV1wT411p7yf)支持 | 同时被GPT3.5、GPT4、[清华ChatGLM](https://github.com/THUDM/ChatGLM-6B)、[复旦MOSS](https://github.com/OpenLMLab/MOSS)同时伺候的感觉一定会很不错吧?
更多LLM模型接入支持[huggingface部署](https://huggingface.co/spaces/qingxu98/gpt-academic) | 加入Newbing接口(新必应),引入清华[Jittorllms](https://github.com/Jittor/JittorLLMs)支持[LLaMA](https://github.com/facebookresearch/llama)[RWKV](https://github.com/BlinkDL/ChatRWKV)和[盘古α](https://openi.org.cn/pangu/)
更多新功能展示(图像生成等) …… | 见本文档结尾处 ……
@ -91,8 +91,8 @@ chat分析报告生成 | [函数插件] 运行后自动生成总结汇报
1. 下载项目
```sh
git clone https://github.com/binary-husky/chatgpt_academic.git
cd chatgpt_academic
git clone https://github.com/binary-husky/.git
cd gpt_academic
```
2. 配置API_KEY
@ -150,8 +150,8 @@ python main.py
1. 仅ChatGPT推荐大多数人选择
``` sh
git clone https://github.com/binary-husky/chatgpt_academic.git # 下载项目
cd chatgpt_academic # 进入路径
git clone https://github.com/binary-husky/gpt_academic.git # 下载项目
cd gpt_academic # 进入路径
nano config.py # 用任意文本编辑器编辑config.py, 配置 “Proxy” “API_KEY” 以及 “WEB_PORT” (例如50923) 等
docker build -t gpt-academic . # 安装
@ -188,10 +188,10 @@ docker-compose up
按照`config.py`中的说明配置API_URL_REDIRECT即可。
4. 远程云服务器部署(需要云服务器知识与经验)。
请访问[部署wiki-1](https://github.com/binary-husky/chatgpt_academic/wiki/%E4%BA%91%E6%9C%8D%E5%8A%A1%E5%99%A8%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E6%8C%87%E5%8D%97)
请访问[部署wiki-1](https://github.com/binary-husky/gpt_academic/wiki/%E4%BA%91%E6%9C%8D%E5%8A%A1%E5%99%A8%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E6%8C%87%E5%8D%97)
5. 使用WSL2Windows Subsystem for Linux 子系统)。
请访问[部署wiki-2](https://github.com/binary-husky/chatgpt_academic/wiki/%E4%BD%BF%E7%94%A8WSL2%EF%BC%88Windows-Subsystem-for-Linux-%E5%AD%90%E7%B3%BB%E7%BB%9F%EF%BC%89%E9%83%A8%E7%BD%B2)
请访问[部署wiki-2](https://github.com/binary-husky/gpt_academic/wiki/%E4%BD%BF%E7%94%A8WSL2%EF%BC%88Windows-Subsystem-for-Linux-%E5%AD%90%E7%B3%BB%E7%BB%9F%EF%BC%89%E9%83%A8%E7%BD%B2)
6. 如何在二级网址(如`http://localhost/subpath`)下运行。
请访问[FastAPI运行说明](docs/WithFastapi.md)
@ -220,7 +220,7 @@ docker-compose up
编写强大的函数插件来执行任何你想得到的和想不到的任务。
本项目的插件编写、调试难度很低只要您具备一定的python基础知识就可以仿照我们提供的模板实现自己的插件功能。
详情请参考[函数插件指南](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)。
详情请参考[函数插件指南](https://github.com/binary-husky/gpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)。
---
# Latest Update
@ -228,7 +228,7 @@ docker-compose up
1. 对话保存功能。在函数插件区调用 `保存当前的对话` 即可将当前对话保存为可读+可复原的html文件
另外在函数插件区(下拉菜单)调用 `载入对话历史存档` ,即可还原之前的会话。
Tip不指定文件直接点击 `载入对话历史存档` 可以查看历史html存档缓存,点击 `删除所有本地对话历史记录` 可以删除所有html存档缓存
Tip不指定文件直接点击 `载入对话历史存档` 可以查看历史html存档缓存。
<div align="center">
<img src="https://user-images.githubusercontent.com/96192199/235222390-24a9acc0-680f-49f5-bc81-2f3161f1e049.png" width="500" >
</div>
@ -251,38 +251,33 @@ Tip不指定文件直接点击 `载入对话历史存档` 可以查看历史h
<img src="https://user-images.githubusercontent.com/96192199/227504931-19955f78-45cd-4d1c-adac-e71e50957915.png" height="400" >
</div>
5. 这是一个能够“自我译解”的开源项目
<div align="center">
<img src="https://user-images.githubusercontent.com/96192199/226936850-c77d7183-0749-4c1c-9875-fd4891842d0c.png" width="500" >
</div>
6. 译解其他开源项目,不在话下
5. 译解其他开源项目
<div align="center">
<img src="https://user-images.githubusercontent.com/96192199/226935232-6b6a73ce-8900-4aee-93f9-733c7e6fef53.png" height="250" >
<img src="https://user-images.githubusercontent.com/96192199/226969067-968a27c1-1b9c-486b-8b81-ab2de8d3f88a.png" height="250" >
</div>
7. 装饰[live2d](https://github.com/fghrsh/live2d_demo)的小功能(默认关闭,需要修改`config.py`
6. 装饰[live2d](https://github.com/fghrsh/live2d_demo)的小功能(默认关闭,需要修改`config.py`
<div align="center">
<img src="https://user-images.githubusercontent.com/96192199/236432361-67739153-73e8-43fe-8111-b61296edabd9.png" width="500" >
</div>
8. 新增MOSS大语言模型支持
7. 新增MOSS大语言模型支持
<div align="center">
<img src="https://user-images.githubusercontent.com/96192199/236639178-92836f37-13af-4fdd-984d-b4450fe30336.png" width="500" >
</div>
9. OpenAI图像生成
8. OpenAI图像生成
<div align="center">
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/bc7ab234-ad90-48a0-8d62-f703d9e74665" width="500" >
</div>
10. OpenAI音频解析与总结
9. OpenAI音频解析与总结
<div align="center">
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/709ccf95-3aee-498a-934a-e1c22d3d5d5b" width="500" >
</div>
11. Latex全文校对纠错
10. Latex全文校对纠错
<div align="center">
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/651ccd98-02c9-4464-91e1-77a6b7d1b033" height="200" > ===>
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/476f66d9-7716-4537-b5c1-735372c25adb" height="200">
@ -310,30 +305,32 @@ gpt_academic开发者QQ群-2610599535
- 已知问题
- 某些浏览器翻译插件干扰此软件前端的运行
- 官方Gradio目前有很多兼容性Bug请务必使用requirement.txt安装Gradio
- 官方Gradio目前有很多兼容性Bug请务必使用`requirement.txt`安装Gradio
## 参考与学习
```
代码中参考了很多其他优秀项目中的设计,主要包括
代码中参考了很多其他优秀项目中的设计,顺序不分先后
# 项目1清华ChatGLM-6B:
# 清华ChatGLM-6B:
https://github.com/THUDM/ChatGLM-6B
# 项目2清华JittorLLMs:
# 清华JittorLLMs:
https://github.com/Jittor/JittorLLMs
# 项目3Edge-GPT:
https://github.com/acheong08/EdgeGPT
# 项目4ChuanhuChatGPT:
https://github.com/GaiZhenbiao/ChuanhuChatGPT
# 项目5ChatPaper:
# ChatPaper:
https://github.com/kaixindelele/ChatPaper
# 更多:
# Edge-GPT:
https://github.com/acheong08/EdgeGPT
# ChuanhuChatGPT:
https://github.com/GaiZhenbiao/ChuanhuChatGPT
# Oobabooga one-click installer:
https://github.com/oobabooga/one-click-installers
# More
https://github.com/gradio-app/gradio
https://github.com/fghrsh/live2d_demo
https://github.com/oobabooga/one-click-installers
```

View File

@ -46,7 +46,7 @@ MAX_RETRY = 2
# 模型选择是 (注意: LLM_MODEL是默认选中的模型, 同时它必须被包含在AVAIL_LLM_MODELS切换列表中 )
LLM_MODEL = "gpt-3.5-turbo" # 可选 ↓↓↓
AVAIL_LLM_MODELS = ["gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "chatglm", "moss", "newbing", "newbing-free", "stack-claude"]
AVAIL_LLM_MODELS = ["gpt-3.5-turbo-16k", "gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "chatglm", "moss", "newbing", "newbing-free", "stack-claude"]
# P.S. 其他可用的模型还包括 ["newbing-free", "jittorllms_rwkv", "jittorllms_pangualpha", "jittorllms_llama"]
# 本地LLM模型如ChatGLM的执行方式 CPU/GPU

View File

@ -30,7 +30,7 @@ def 知识库问答(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_pro
)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
from .crazy_utils import try_install_deps
try_install_deps(['zh_langchain==0.2.0'])
try_install_deps(['zh_langchain==0.2.1'])
# < --------------------读取参数--------------- >
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
@ -84,7 +84,7 @@ def 读取知识库作答(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
chatbot.append(["依赖不足", "导入依赖失败。正在尝试自动安装,请查看终端的输出或耐心等待..."])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
from .crazy_utils import try_install_deps
try_install_deps(['zh_langchain==0.2.0'])
try_install_deps(['zh_langchain==0.2.1'])
# < ------------------- --------------- >
kai = knowledge_archive_interface()

View File

@ -5,7 +5,7 @@ pj = os.path.join
ARXIV_CACHE_DIR = os.path.expanduser(f"~/arxiv_cache/")
# =================================== 工具函数 ===============================================
沙雕GPT啊别犯这些低级翻译错误 = 'You must to translate "agent" to "智能体". '
专业词汇声明 = 'If the term "agent" is used in this section, it should be translated to "智能体". '
def switch_prompt(pfg, mode):
"""
Generate prompts and system prompts based on the mode for proofreading or translating.
@ -25,7 +25,7 @@ def switch_prompt(pfg, mode):
f"\n\n{frag}" for frag in pfg.sp_file_contents]
sys_prompt_array = ["You are a professional academic paper writer." for _ in range(n_split)]
elif mode == 'translate_zh':
inputs_array = [r"Below is a section from an English academic paper, translate it into Chinese." + 沙雕GPT啊别犯这些低级翻译错误 +
inputs_array = [r"Below is a section from an English academic paper, translate it into Chinese. " + 专业词汇声明 +
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " +
r"Answer me only with the translated text:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
@ -146,7 +146,7 @@ def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, histo
from .latex_utils import Latex精细分解与转化, 编译Latex
except Exception as e:
chatbot.append([ f"解析项目: {txt}",
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
@ -216,7 +216,7 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
from .latex_utils import Latex精细分解与转化, 编译Latex
except Exception as e:
chatbot.append([ f"解析项目: {txt}",
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return

View File

@ -23,13 +23,67 @@ def split_worker(text, mask, pattern, flags=0):
mask[res.span()[0]:res.span()[1]] = PRESERVE
return text, mask
def split_worker_reverse_caption(text, mask, pattern, flags=0):
def set_transform_area(text, mask, pattern, flags=0):
"""
Move caption area out of preserve area
Add a transform text area in this paper
"""
pattern_compile = re.compile(pattern, flags)
for res in pattern_compile.finditer(text):
mask[res.regs[1][0]:res.regs[1][1]] = TRANSFORM
mask[res.span()[0] : res.span()[1]] = TRANSFORM
return text, mask
def split_worker_careful_brace(text, mask, pattern, flags=0):
"""
Move area into preserve area.
It is better to wrap the curly braces in the capture group, e.g., r"\\captioin(\{.*\})".
"""
pattern_compile = re.compile(pattern, flags)
res = pattern_compile.search(text)
# 确保捕获组存在
if res and len(res.regs) > 1:
brace_level = 0
p = begin = end = res.regs[1][0]
for _ in range(1024 * 16):
if text[p] == "}" and brace_level == 1:
break
elif text[p] == "}":
brace_level -= 1
elif text[p] == "{":
brace_level += 1
p += 1
end = p
mask[begin + 1 : end] = PRESERVE
split_worker_careful_brace(text[end:], mask[end:], pattern, flags=flags)
return text, mask
def split_worker_reverse_careful_brace(text, mask, pattern, flags=0):
"""
Move area out of preserve area.
It is better to wrap the curly braces in the capture group, e.g., r"\\captioin(\{.*\})".
"""
pattern_compile = re.compile(pattern, flags)
res = pattern_compile.search(text)
# 确保捕获组存在
if res and len(res.regs) > 1:
brace_level = 0
p = begin = end = res.regs[1][0]
for _ in range(1024 * 16):
if text[p] == "}" and brace_level == 1:
break
elif text[p] == "}":
brace_level -= 1
elif text[p] == "{":
brace_level += 1
p += 1
end = p
mask[begin + 1 : end] = TRANSFORM
split_worker_reverse_careful_brace(text[end:], mask[end:], pattern, flags=flags)
return text, mask
def split_worker_begin_end(text, mask, pattern, flags=0, limit_n_lines=42):
@ -97,17 +151,19 @@ def 寻找Latex主文件(file_manifest, mode):
else:
continue
raise RuntimeError('无法找到一个主Tex文件包含documentclass关键字')
def rm_comments(main_file):
new_file_remove_comment_lines = []
for l in main_file.splitlines():
# 删除整行的空注释
if l.startswith("%") or (l.startswith(" ") and l.lstrip().startswith("%")):
if l.lstrip().startswith("%"):
pass
else:
new_file_remove_comment_lines.append(l)
main_file = '\n'.join(new_file_remove_comment_lines)
main_file = re.sub(r'(?<!\\)%.*', '', main_file) # 使用正则表达式查找半行注释, 并替换为空字符串
return main_file
def merge_tex_files_(project_foler, main_file, mode):
"""
Merge Tex project recrusively
@ -138,17 +194,23 @@ def merge_tex_files(project_foler, main_file, mode):
main_file = rm_comments(main_file)
if mode == 'translate_zh':
# find paper documentclass
pattern = re.compile(r'\\documentclass.*\n')
match = pattern.search(main_file)
assert match is not None, "Cannot find documentclass statement!"
position = match.end()
add_ctex = '\\usepackage{ctex}\n'
add_url = '\\usepackage{url}\n' if '{url}' not in main_file else ''
main_file = main_file[:position] + add_ctex + add_url + main_file[position:]
# 2 fontset=windows
# fontset=windows
import platform
if platform.system() != 'Windows':
main_file = re.sub(r"\\documentclass\[(.*?)\]{(.*?)}", r"\\documentclass[\1,fontset=windows]{\2}",main_file)
main_file = re.sub(r"\\documentclass{(.*?)}", r"\\documentclass[fontset=windows]{\1}",main_file)
# find paper abstract
pattern = re.compile(r'\\begin\{abstract\}.*\n')
match = pattern.search(main_file)
assert match is not None, "Cannot find paper abstract section!"
return main_file
@ -185,14 +247,39 @@ def fix_content(final_tex, node_string):
if node_string.count('\_') > 0 and node_string.count('\_') > final_tex.count('\_'):
# walk and replace any _ without \
final_tex = re.sub(r"(?<!\\)_", "\\_", final_tex)
if node_string.count('{') != node_string.count('}'):
if final_tex.count('{') != node_string.count('{'):
final_tex = node_string # 出问题了,还原原文
if final_tex.count('}') != node_string.count('}'):
final_tex = node_string # 出问题了,还原原文
def compute_brace_level(string):
# this function count the number of { and }
brace_level = 0
for c in string:
if c == "{": brace_level += 1
elif c == "}": brace_level -= 1
return brace_level
def join_most(tex_t, tex_o):
# this function join translated string and original string when something goes wrong
p_t = 0
p_o = 0
def find_next(string, chars, begin):
p = begin
while p < len(string):
if string[p] in chars: return p, string[p]
p += 1
return None, None
while True:
res1, char = find_next(tex_o, ['{','}'], p_o)
if res1 is None: break
res2, char = find_next(tex_t, [char], p_t)
if res2 is None: break
p_o = res1 + 1
p_t = res2 + 1
return tex_t[:p_t] + tex_o[p_o:]
if compute_brace_level(final_tex) != compute_brace_level(node_string):
# 出问题了,还原部分原文,保证括号正确
final_tex = join_most(final_tex, node_string)
return final_tex
def split_subprocess(txt, project_folder, return_dict):
def split_subprocess(txt, project_folder, return_dict, opts):
"""
break down latex file to a linked list,
each node use a preserve flag to indicate whether it should
@ -202,13 +289,14 @@ def split_subprocess(txt, project_folder, return_dict):
mask = np.zeros(len(txt), dtype=np.uint8) + TRANSFORM
# 吸收title与作者以上的部分
text, mask = split_worker(text, mask, r"(.*?)\\maketitle", re.DOTALL)
text, mask = split_worker(text, mask, r".*?\\begin\{document\}", re.DOTALL)
# 删除iffalse注释
text, mask = split_worker(text, mask, r"\\iffalse(.*?)\\fi", re.DOTALL)
# 吸收在25行以内的begin-end组合
text, mask = split_worker_begin_end(text, mask, r"\\begin\{([a-z\*]*)\}(.*?)\\end\{\1\}", re.DOTALL, limit_n_lines=25)
# 吸收匿名公式
text, mask = split_worker(text, mask, r"\$\$(.*?)\$\$", re.DOTALL)
text, mask = split_worker(text, mask, r"\\\[.*?\\\]", re.DOTALL)
# 吸收其他杂项
text, mask = split_worker(text, mask, r"\\section\{(.*?)\}")
text, mask = split_worker(text, mask, r"\\section\*\{(.*?)\}")
@ -216,6 +304,7 @@ def split_subprocess(txt, project_folder, return_dict):
text, mask = split_worker(text, mask, r"\\subsubsection\{(.*?)\}")
text, mask = split_worker(text, mask, r"\\bibliography\{(.*?)\}")
text, mask = split_worker(text, mask, r"\\bibliographystyle\{(.*?)\}")
text, mask = split_worker(text, mask, r"\\begin\{thebibliography\}.*?\\end\{thebibliography\}", re.DOTALL)
text, mask = split_worker(text, mask, r"\\begin\{lstlisting\}(.*?)\\end\{lstlisting\}", re.DOTALL)
text, mask = split_worker(text, mask, r"\\begin\{wraptable\}(.*?)\\end\{wraptable\}", re.DOTALL)
text, mask = split_worker(text, mask, r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}", re.DOTALL)
@ -235,11 +324,18 @@ def split_subprocess(txt, project_folder, return_dict):
text, mask = split_worker(text, mask, r"\\begin\{equation\*\}(.*?)\\end\{equation\*\}", re.DOTALL)
text, mask = split_worker(text, mask, r"\\item ")
text, mask = split_worker(text, mask, r"\\label\{(.*?)\}")
text, mask = split_worker(text, mask, r"\\begin\{(.*?)\}")
text, mask = split_worker(text, mask, r"\\vspace\{(.*?)\}")
text, mask = split_worker(text, mask, r"\\hspace\{(.*?)\}")
text, mask = set_transform_area(text, mask, r"\\begin\{abstract\}.*?\\end\{abstract\}", re.DOTALL)
text, mask = split_worker_careful_brace(text, mask, r"\\hl(\{.*\})", re.DOTALL)
text, mask = split_worker_reverse_careful_brace(text, mask, r"\\caption(\{.*\})", re.DOTALL)
text, mask = split_worker_reverse_careful_brace(text, mask, r"\\abstract(\{.*\})", re.DOTALL)
text, mask = split_worker(text, mask, r"\\begin\{(.*?)\}")
text, mask = split_worker(text, mask, r"\\end\{(.*?)\}")
# text, mask = split_worker_reverse_caption(text, mask, r"\\caption\{(.*?)\}", re.DOTALL)
root = convert_to_linklist(text, mask)
# 修复括号
@ -365,11 +461,14 @@ class LatexPaperSplit():
if mode == 'translate_zh':
pattern = re.compile(r'\\begin\{abstract\}.*\n')
match = pattern.search(result_string)
if not match:
pattern = re.compile(r'\\abstract\{')
match = pattern.search(result_string)
position = match.end()
result_string = result_string[:position] + self.msg + msg + self.msg_declare + result_string[position:]
return result_string
def split(self, txt, project_folder):
def split(self, txt, project_folder, opts):
"""
break down latex file to a linked list,
each node use a preserve flag to indicate whether it should
@ -381,7 +480,7 @@ class LatexPaperSplit():
return_dict = manager.dict()
p = multiprocessing.Process(
target=split_subprocess,
args=(txt, project_folder, return_dict))
args=(txt, project_folder, return_dict, opts))
p.start()
p.join()
self.nodes = return_dict['nodes']
@ -440,7 +539,7 @@ class LatexPaperFileGroup():
def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, mode='proofread', switch_prompt=None):
def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, mode='proofread', switch_prompt=None, opts=[]):
import time, os, re
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from .latex_utils import LatexPaperFileGroup, merge_tex_files, LatexPaperSplit, 寻找Latex主文件
@ -469,8 +568,10 @@ def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin
f.write(merged_content)
# <-------- 精细切分latex文件 ---------->
chatbot.append((f"Latex文件融合完成", f'[Local Message] 正在精细切分latex文件这需要一段时间计算文档越长耗时越长请耐心等待。'))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
lps = LatexPaperSplit()
res = lps.split(merged_content, project_folder) # 消耗时间的函数
res = lps.split(merged_content, project_folder, opts) # 消耗时间的函数
# <-------- 拆分过长的latex片段 ---------->
pfg = LatexPaperFileGroup()
@ -567,7 +668,7 @@ def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_f
current_dir = os.getcwd()
n_fix = 1
max_try = 32
chatbot.append([f"正在编译PDF文档", f'编译已经开始。当前工作路径为{work_folder}如果程序停顿5分钟以上则大概率是卡死在Latex里面了。不幸卡死时请直接去该路径下取回翻译结果,或者重启之后再度尝试 ...']); yield from update_ui(chatbot=chatbot, history=history)
chatbot.append([f"正在编译PDF文档", f'编译已经开始。当前工作路径为{work_folder}如果程序停顿5分钟以上请直接去该路径下取回翻译结果,或者重启之后再度尝试 ...']); yield from update_ui(chatbot=chatbot, history=history)
chatbot.append([f"正在编译PDF文档", '...']); yield from update_ui(chatbot=chatbot, history=history); time.sleep(1); chatbot[-1] = list(chatbot[-1]) # 刷新界面
yield from update_ui_lastest_msg('编译已经开始...', chatbot, history) # 刷新Gradio前端界面

View File

@ -84,6 +84,15 @@ model_info = {
"token_cnt": get_token_num_gpt35,
},
"gpt-3.5-turbo-16k": {
"fn_with_ui": chatgpt_ui,
"fn_without_ui": chatgpt_noui,
"endpoint": openai_endpoint,
"max_token": 1024*16,
"tokenizer": tokenizer_gpt35,
"token_cnt": get_token_num_gpt35,
},
"gpt-4": {
"fn_with_ui": chatgpt_ui,
"fn_without_ui": chatgpt_noui,

View File

@ -1,5 +1,5 @@
{
"version": 3.4,
"version": 3.41,
"show_feature": true,
"new_feature": "新增最强Arxiv论文翻译插件 <-> 修复gradio复制按钮BUG <-> 修复PDF翻译的BUG, 新增HTML中英双栏对照 <-> 添加了OpenAI图片生成插件 <-> 添加了OpenAI音频转文本总结插件 <-> 通过Slack添加对Claude的支持"
"new_feature": "增加gpt-3.5-16k的支持 <-> 新增最强Arxiv论文翻译插件 <-> 修复gradio复制按钮BUG <-> 修复PDF翻译的BUG, 新增HTML中英双栏对照 <-> 添加了OpenAI图片生成插件 <-> 添加了OpenAI音频转文本总结插件 <-> 通过Slack添加对Claude的支持"
}