Merge branch 'master' of https://github.com/OverKit/gpt_academic into OverKit-master

2023-06-19 14:49:50 +10:00 · 2023-06-19 14:49:50 +10:00 · dca9ec4bae
commit dca9ec4bae
parent ce0d8b9dab 7fdf0a8e51
7 changed files with 173 additions and 66 deletions
--- a/README.md
+++ b/README.md
@ -16,7 +16,7 @@ To translate this project to arbitary language with GPT, read and run [`multi_la
 >
 > 1.请注意只有**红颜色**标识的函数插件（按钮）才支持读取文件，部分插件位于插件区的**下拉菜单**中。另外我们以**最高优先级**欢迎和处理任何新插件的PR！
 >
-> 2.本项目中每个文件的功能都在自译解[`self_analysis.md`](https://github.com/binary-husky/chatgpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)详细说明。随着版本的迭代，您也可以随时自行点击相关函数插件，调用GPT重新生成项目的自我解析报告。常见问题汇总在[`wiki`](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98)当中。[安装方法](#installation)。
+> 2.本项目中每个文件的功能都在自译解[`self_analysis.md`](https://github.com/binary-husky/gpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)详细说明。随着版本的迭代，您也可以随时自行点击相关函数插件，调用GPT重新生成项目的自我解析报告。常见问题汇总在[`wiki`](https://github.com/binary-husky/gpt_academic/wiki/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98)当中。[安装方法](#installation)。
 > 
 > 3.本项目兼容并鼓励尝试国产大语言模型chatglm和RWKV, 盘古等等。支持多个api-key共存，可在配置文件中填写如`API_KEY="openai-key1,openai-key2,api2d-key3"`。需要临时更换`API_KEY`时，在输入区输入临时的`API_KEY`然后回车键提交后即可生效。

@ -31,13 +31,13 @@ To translate this project to arbitary language with GPT, read and run [`multi_la
 一键中英互译 | 一键中英互译
 一键代码解释 | 显示代码、解释代码、生成代码、给代码加注释
 [自定义快捷键](https://www.bilibili.com/video/BV14s4y1E7jN) | 支持自定义快捷键
-模块化设计 | 支持自定义强大的[函数插件](https://github.com/binary-husky/chatgpt_academic/tree/master/crazy_functions)，插件支持[热更新](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)
-[自我程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] [一键读懂](https://github.com/binary-husky/chatgpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)本项目的源代码
+模块化设计 | 支持自定义强大的[函数插件](https://github.com/binary-husky/gpt_academic/tree/master/crazy_functions)，插件支持[热更新](https://github.com/binary-husky/gpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)
+[自我程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] [一键读懂](https://github.com/binary-husky/gpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)本项目的源代码
 [程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] 一键可以剖析其他Python/C/C++/Java/Lua/...项目树
 读论文、[翻译](https://www.bilibili.com/video/BV1KT411x7Wn)论文 | [函数插件] 一键解读latex/pdf论文全文并生成摘要
 Latex全文[翻译](https://www.bilibili.com/video/BV1nk4y1Y7Js/)、[润色](https://www.bilibili.com/video/BV1FT411H7c5/) | [函数插件] 一键翻译或润色latex论文
 批量注释生成 | [函数插件] 一键批量生成函数注释
-Markdown[中英互译](https://www.bilibili.com/video/BV1yo4y157jV/) | [函数插件] 看到上面5种语言的[README](https://github.com/binary-husky/chatgpt_academic/blob/master/docs/README_EN.md)了吗？
+Markdown[中英互译](https://www.bilibili.com/video/BV1yo4y157jV/) | [函数插件] 看到上面5种语言的[README](https://github.com/binary-husky/gpt_academic/blob/master/docs/README_EN.md)了吗？
 chat分析报告生成 | [函数插件] 运行后自动生成总结汇报
 [PDF论文全文翻译功能](https://www.bilibili.com/video/BV1KT411x7Wn) | [函数插件] PDF论文提取题目&摘要+翻译全文（多线程）
 [Arxiv小助手](https://www.bilibili.com/video/BV1LM4y1279X) | [函数插件] 输入arxiv文章url即可一键翻译摘要+下载PDF
@ -46,8 +46,8 @@ chat分析报告生成 | [函数插件] 运行后自动生成总结汇报
 ⭐Arxiv论文精细翻译 | [函数插件] 一键[以超高质量翻译arxiv论文](https://www.bilibili.com/video/BV1dz4y1v77A/)，迄今为止最好的论文翻译工具⭐
 公式/图片/表格显示 | 可以同时显示公式的[tex形式和渲染形式](https://user-images.githubusercontent.com/96192199/230598842-1d7fcddd-815d-40ee-af60-baf488a199df.png)，支持公式、代码高亮
 多线程函数插件支持 | 支持多线调用chatgpt，一键处理[海量文本](https://www.bilibili.com/video/BV1FT411H7c5/)或程序
-启动暗色gradio[主题](https://github.com/binary-husky/chatgpt_academic/issues/173) | 在浏览器url后面添加```/?__theme=dark```可以切换dark主题
-[多LLM模型](https://www.bilibili.com/video/BV1wT411p7yf)支持，[API2D](https://api2d.com/)接口支持 | 同时被GPT3.5、GPT4、[清华ChatGLM](https://github.com/THUDM/ChatGLM-6B)、[复旦MOSS](https://github.com/OpenLMLab/MOSS)同时伺候的感觉一定会很不错吧？
+启动暗色gradio[主题](https://github.com/binary-husky/gpt_academic/issues/173) | 在浏览器url后面添加```/?__theme=dark```可以切换dark主题
+[多LLM模型](https://www.bilibili.com/video/BV1wT411p7yf)支持 | 同时被GPT3.5、GPT4、[清华ChatGLM](https://github.com/THUDM/ChatGLM-6B)、[复旦MOSS](https://github.com/OpenLMLab/MOSS)同时伺候的感觉一定会很不错吧？
 更多LLM模型接入，支持[huggingface部署](https://huggingface.co/spaces/qingxu98/gpt-academic) | 加入Newbing接口(新必应)，引入清华[Jittorllms](https://github.com/Jittor/JittorLLMs)支持[LLaMA](https://github.com/facebookresearch/llama)，[RWKV](https://github.com/BlinkDL/ChatRWKV)和[盘古α](https://openi.org.cn/pangu/)
 更多新功能展示(图像生成等) …… | 见本文档结尾处 ……

@ -91,8 +91,8 @@ chat分析报告生成 | [函数插件] 运行后自动生成总结汇报

 1. 下载项目
 ```sh
-git clone https://github.com/binary-husky/chatgpt_academic.git
-cd chatgpt_academic
+git clone https://github.com/binary-husky/.git
+cd gpt_academic
 ```

 2. 配置API_KEY
@ -150,8 +150,8 @@ python main.py
 1. 仅ChatGPT（推荐大多数人选择）

 ``` sh
-git clone https://github.com/binary-husky/chatgpt_academic.git  # 下载项目
-cd chatgpt_academic                                 # 进入路径
+git clone https://github.com/binary-husky/gpt_academic.git  # 下载项目
+cd gpt_academic                                 # 进入路径
 nano config.py                                      # 用任意文本编辑器编辑config.py, 配置 “Proxy”， “API_KEY” 以及 “WEB_PORT” (例如50923) 等
 docker build -t gpt-academic .                      # 安装

@ -188,10 +188,10 @@ docker-compose up
 按照`config.py`中的说明配置API_URL_REDIRECT即可。

 4. 远程云服务器部署（需要云服务器知识与经验）。
-请访问[部署wiki-1](https://github.com/binary-husky/chatgpt_academic/wiki/%E4%BA%91%E6%9C%8D%E5%8A%A1%E5%99%A8%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E6%8C%87%E5%8D%97)
+请访问[部署wiki-1](https://github.com/binary-husky/gpt_academic/wiki/%E4%BA%91%E6%9C%8D%E5%8A%A1%E5%99%A8%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E6%8C%87%E5%8D%97)

 5. 使用WSL2（Windows Subsystem for Linux 子系统）。
-请访问[部署wiki-2](https://github.com/binary-husky/chatgpt_academic/wiki/%E4%BD%BF%E7%94%A8WSL2%EF%BC%88Windows-Subsystem-for-Linux-%E5%AD%90%E7%B3%BB%E7%BB%9F%EF%BC%89%E9%83%A8%E7%BD%B2)
+请访问[部署wiki-2](https://github.com/binary-husky/gpt_academic/wiki/%E4%BD%BF%E7%94%A8WSL2%EF%BC%88Windows-Subsystem-for-Linux-%E5%AD%90%E7%B3%BB%E7%BB%9F%EF%BC%89%E9%83%A8%E7%BD%B2)

 6. 如何在二级网址（如`http://localhost/subpath`）下运行。
 请访问[FastAPI运行说明](docs/WithFastapi.md)
@ -220,7 +220,7 @@ docker-compose up

 编写强大的函数插件来执行任何你想得到的和想不到的任务。
 本项目的插件编写、调试难度很低，只要您具备一定的python基础知识，就可以仿照我们提供的模板实现自己的插件功能。
-详情请参考[函数插件指南](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)。
+详情请参考[函数插件指南](https://github.com/binary-husky/gpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)。

 ---
 # Latest Update
@ -228,7 +228,7 @@ docker-compose up

 1. 对话保存功能。在函数插件区调用 `保存当前的对话` 即可将当前对话保存为可读+可复原的html文件，
 另外在函数插件区（下拉菜单）调用 `载入对话历史存档` ，即可还原之前的会话。
-Tip：不指定文件直接点击 `载入对话历史存档` 可以查看历史html存档缓存，点击 `删除所有本地对话历史记录` 可以删除所有html存档缓存。
+Tip：不指定文件直接点击 `载入对话历史存档` 可以查看历史html存档缓存。
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/235222390-24a9acc0-680f-49f5-bc81-2f3161f1e049.png" width="500" >
 </div>
@ -251,38 +251,33 @@ Tip：不指定文件直接点击 `载入对话历史存档` 可以查看历史h
 <img src="https://user-images.githubusercontent.com/96192199/227504931-19955f78-45cd-4d1c-adac-e71e50957915.png" height="400" >
 </div>

-5. 这是一个能够“自我译解”的开源项目
-<div align="center">
-<img src="https://user-images.githubusercontent.com/96192199/226936850-c77d7183-0749-4c1c-9875-fd4891842d0c.png" width="500" >
-</div>
-
-6. 译解其他开源项目，不在话下
+5. 译解其他开源项目
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/226935232-6b6a73ce-8900-4aee-93f9-733c7e6fef53.png" height="250" >
 <img src="https://user-images.githubusercontent.com/96192199/226969067-968a27c1-1b9c-486b-8b81-ab2de8d3f88a.png" height="250" >
 </div>

-7. 装饰[live2d](https://github.com/fghrsh/live2d_demo)的小功能（默认关闭，需要修改`config.py`）
+6. 装饰[live2d](https://github.com/fghrsh/live2d_demo)的小功能（默认关闭，需要修改`config.py`）
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/236432361-67739153-73e8-43fe-8111-b61296edabd9.png" width="500" >
 </div>

-8. 新增MOSS大语言模型支持
+7. 新增MOSS大语言模型支持
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/236639178-92836f37-13af-4fdd-984d-b4450fe30336.png" width="500" >
 </div>

-9. OpenAI图像生成
+8. OpenAI图像生成
 <div align="center">
 <img src="https://github.com/binary-husky/gpt_academic/assets/96192199/bc7ab234-ad90-48a0-8d62-f703d9e74665" width="500" >
 </div>

-10. OpenAI音频解析与总结
+9. OpenAI音频解析与总结
 <div align="center">
 <img src="https://github.com/binary-husky/gpt_academic/assets/96192199/709ccf95-3aee-498a-934a-e1c22d3d5d5b" width="500" >
 </div>

-11. Latex全文校对纠错
+10. Latex全文校对纠错
 <div align="center">
 <img src="https://github.com/binary-husky/gpt_academic/assets/96192199/651ccd98-02c9-4464-91e1-77a6b7d1b033" height="200" > ===>
 <img src="https://github.com/binary-husky/gpt_academic/assets/96192199/476f66d9-7716-4537-b5c1-735372c25adb" height="200">
@ -310,30 +305,32 @@ gpt_academic开发者QQ群-2：610599535

 - 已知问题
    - 某些浏览器翻译插件干扰此软件前端的运行
-    - 官方Gradio目前有很多兼容性Bug，请务必使用requirement.txt安装Gradio
+    - 官方Gradio目前有很多兼容性Bug，请务必使用`requirement.txt`安装Gradio

 ## 参考与学习

 ```
-代码中参考了很多其他优秀项目中的设计，主要包括：
+代码中参考了很多其他优秀项目中的设计，顺序不分先后：

-# 项目1：清华ChatGLM-6B:
+# 清华ChatGLM-6B:
 https://github.com/THUDM/ChatGLM-6B

-# 项目2：清华JittorLLMs:
+# 清华JittorLLMs:
 https://github.com/Jittor/JittorLLMs

-# 项目3：Edge-GPT:
-https://github.com/acheong08/EdgeGPT
-
-# 项目4：ChuanhuChatGPT:
-https://github.com/GaiZhenbiao/ChuanhuChatGPT
-
-# 项目5：ChatPaper:
+# ChatPaper:
 https://github.com/kaixindelele/ChatPaper

-# 更多：
+# Edge-GPT:
+https://github.com/acheong08/EdgeGPT
+
+# ChuanhuChatGPT:
+https://github.com/GaiZhenbiao/ChuanhuChatGPT
+
+# Oobabooga one-click installer:
+https://github.com/oobabooga/one-click-installers
+
+# More：
 https://github.com/gradio-app/gradio
 https://github.com/fghrsh/live2d_demo
-https://github.com/oobabooga/one-click-installers
 ```
--- a/config.py
+++ b/config.py
@ -46,7 +46,7 @@ MAX_RETRY = 2

 # 模型选择是 (注意: LLM_MODEL是默认选中的模型, 同时它必须被包含在AVAIL_LLM_MODELS切换列表中 )
 LLM_MODEL = "gpt-3.5-turbo" # 可选 ↓↓↓
-AVAIL_LLM_MODELS = ["gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "chatglm", "moss", "newbing", "newbing-free", "stack-claude"]
+AVAIL_LLM_MODELS = ["gpt-3.5-turbo-16k", "gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "chatglm", "moss", "newbing", "newbing-free", "stack-claude"]
 # P.S. 其他可用的模型还包括 ["newbing-free", "jittorllms_rwkv", "jittorllms_pangualpha", "jittorllms_llama"]

 # 本地LLM模型如ChatGLM的执行方式 CPU/GPU
--- a/crazy_functions/Langchain知识库.py
+++ b/crazy_functions/Langchain知识库.py
@ -30,7 +30,7 @@ def 知识库问答(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_pro
        )
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        from .crazy_utils import try_install_deps
-        try_install_deps(['zh_langchain==0.2.0'])
+        try_install_deps(['zh_langchain==0.2.1'])
    
    # < --------------------读取参数--------------- >
    if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
@ -84,7 +84,7 @@ def 读取知识库作答(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
        chatbot.append(["依赖不足", "导入依赖失败。正在尝试自动安装，请查看终端的输出或耐心等待..."])
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        from .crazy_utils import try_install_deps
-        try_install_deps(['zh_langchain==0.2.0'])
+        try_install_deps(['zh_langchain==0.2.1'])

    # < -------------------  --------------- >
    kai = knowledge_archive_interface()
--- a/crazy_functions/Latex输出PDF结果.py
+++ b/crazy_functions/Latex输出PDF结果.py
@ -5,7 +5,7 @@ pj = os.path.join
 ARXIV_CACHE_DIR = os.path.expanduser(f"~/arxiv_cache/")

 # =================================== 工具函数 ===============================================
-沙雕GPT啊别犯这些低级翻译错误  = 'You must to translate "agent" to "智能体". '
+专业词汇声明  = 'If the term "agent" is used in this section, it should be translated to "智能体". '
 def switch_prompt(pfg, mode):
    """
    Generate prompts and system prompts based on the mode for proofreading or translating.
@ -25,7 +25,7 @@ def switch_prompt(pfg, mode):
                        f"\n\n{frag}" for frag in pfg.sp_file_contents]
        sys_prompt_array = ["You are a professional academic paper writer." for _ in range(n_split)]
    elif mode == 'translate_zh':
-        inputs_array = [r"Below is a section from an English academic paper, translate it into Chinese." + 沙雕GPT啊别犯这些低级翻译错误 + 
+        inputs_array = [r"Below is a section from an English academic paper, translate it into Chinese. " + 专业词汇声明 + 
                        r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " + 
                        r"Answer me only with the translated text:" + 
                        f"\n\n{frag}" for frag in pfg.sp_file_contents]
@ -146,7 +146,7 @@ def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, histo
        from .latex_utils import Latex精细分解与转化, 编译Latex
    except Exception as e:
        chatbot.append([ f"解析项目: {txt}",
-            f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
+            f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        return
    
@ -216,7 +216,7 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
        from .latex_utils import Latex精细分解与转化, 编译Latex
    except Exception as e:
        chatbot.append([ f"解析项目: {txt}",
-            f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
+            f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        return
    
--- a/crazy_functions/latex_utils.py
+++ b/crazy_functions/latex_utils.py
@ -23,13 +23,67 @@ def split_worker(text, mask, pattern, flags=0):
        mask[res.span()[0]:res.span()[1]] = PRESERVE
    return text, mask

-def split_worker_reverse_caption(text, mask, pattern, flags=0):
+def set_transform_area(text, mask, pattern, flags=0):
    """
-    Move caption area out of preserve area 
+    Add a transform text area in this paper
    """
    pattern_compile = re.compile(pattern, flags)
    for res in pattern_compile.finditer(text):
-        mask[res.regs[1][0]:res.regs[1][1]] = TRANSFORM
+        mask[res.span()[0] : res.span()[1]] = TRANSFORM
+    return text, mask
+
+
+def split_worker_careful_brace(text, mask, pattern, flags=0):
+    """
+    Move area into preserve area.
+    It is better to wrap the curly braces in the capture group, e.g., r"\\captioin(\{.*\})".
+    """
+    pattern_compile = re.compile(pattern, flags)
+    res = pattern_compile.search(text)
+
+    # 确保捕获组存在
+    if res and len(res.regs) > 1:
+        brace_level = 0
+        p = begin = end = res.regs[1][0]
+        for _ in range(1024 * 16):
+            if text[p] == "}" and brace_level == 1:
+                break
+            elif text[p] == "}":
+                brace_level -= 1
+            elif text[p] == "{":
+                brace_level += 1
+            p += 1
+        end = p
+        mask[begin + 1 : end] = PRESERVE
+        split_worker_careful_brace(text[end:], mask[end:], pattern, flags=flags)
+
+    return text, mask
+
+
+def split_worker_reverse_careful_brace(text, mask, pattern, flags=0):
+    """
+    Move area out of preserve area.
+    It is better to wrap the curly braces in the capture group, e.g., r"\\captioin(\{.*\})".
+    """
+    pattern_compile = re.compile(pattern, flags)
+    res = pattern_compile.search(text)
+
+    # 确保捕获组存在
+    if res and len(res.regs) > 1:
+        brace_level = 0
+        p = begin = end = res.regs[1][0]
+        for _ in range(1024 * 16):
+            if text[p] == "}" and brace_level == 1:
+                break
+            elif text[p] == "}":
+                brace_level -= 1
+            elif text[p] == "{":
+                brace_level += 1
+            p += 1
+        end = p
+        mask[begin + 1 : end] = TRANSFORM
+        split_worker_reverse_careful_brace(text[end:], mask[end:], pattern, flags=flags)
+
    return text, mask

 def split_worker_begin_end(text, mask, pattern, flags=0, limit_n_lines=42):
@ -97,17 +151,19 @@ def 寻找Latex主文件(file_manifest, mode):
        else:
            continue
    raise RuntimeError('无法找到一个主Tex文件（包含documentclass关键字）')
+
 def rm_comments(main_file):
    new_file_remove_comment_lines = []
    for l in main_file.splitlines():
        # 删除整行的空注释
-        if l.startswith("%") or (l.startswith(" ") and l.lstrip().startswith("%")):
+        if l.lstrip().startswith("%"):
            pass
        else:
            new_file_remove_comment_lines.append(l)
    main_file = '\n'.join(new_file_remove_comment_lines)
    main_file = re.sub(r'(?<!\\)%.*', '', main_file)  # 使用正则表达式查找半行注释, 并替换为空字符串
    return main_file
+
 def merge_tex_files_(project_foler, main_file, mode):
    """
    Merge Tex project recrusively
@ -138,17 +194,23 @@ def merge_tex_files(project_foler, main_file, mode):
    main_file = rm_comments(main_file)

    if mode == 'translate_zh':
+        # find paper documentclass
        pattern = re.compile(r'\\documentclass.*\n')
        match = pattern.search(main_file)
+        assert match is not None, "Cannot find documentclass statement!"
        position = match.end()
        add_ctex = '\\usepackage{ctex}\n'
        add_url = '\\usepackage{url}\n' if '{url}' not in main_file else ''
        main_file = main_file[:position] + add_ctex + add_url + main_file[position:]
-        # 2 fontset=windows
+        # fontset=windows
        import platform
        if platform.system() != 'Windows':
            main_file = re.sub(r"\\documentclass\[(.*?)\]{(.*?)}", r"\\documentclass[\1,fontset=windows]{\2}",main_file)
            main_file = re.sub(r"\\documentclass{(.*?)}", r"\\documentclass[fontset=windows]{\1}",main_file)
+        # find paper abstract
+        pattern = re.compile(r'\\begin\{abstract\}.*\n')
+        match = pattern.search(main_file)
+        assert match is not None, "Cannot find paper abstract section!"
    return main_file


@ -185,14 +247,39 @@ def fix_content(final_tex, node_string):
    if node_string.count('\_') > 0 and node_string.count('\_') > final_tex.count('\_'):
        # walk and replace any _ without \
        final_tex = re.sub(r"(?<!\\)_", "\\_", final_tex)
-    if node_string.count('{') != node_string.count('}'):
-        if final_tex.count('{') != node_string.count('{'):
-            final_tex = node_string # 出问题了，还原原文
-        if final_tex.count('}') != node_string.count('}'):
-            final_tex = node_string # 出问题了，还原原文
+
+    def compute_brace_level(string):
+        # this function count the number of { and }
+        brace_level = 0
+        for c in string:
+            if c == "{": brace_level += 1
+            elif c == "}": brace_level -= 1
+        return brace_level
+    def join_most(tex_t, tex_o):
+        # this function join translated string and original string when something goes wrong
+        p_t = 0
+        p_o = 0
+        def find_next(string, chars, begin):
+            p = begin
+            while p < len(string):
+                if string[p] in chars: return p, string[p]
+                p += 1
+            return None, None
+        while True:
+            res1, char = find_next(tex_o, ['{','}'], p_o)
+            if res1 is None: break
+            res2, char = find_next(tex_t, [char], p_t)
+            if res2 is None: break
+            p_o = res1 + 1
+            p_t = res2 + 1
+        return tex_t[:p_t] + tex_o[p_o:]
+
+    if compute_brace_level(final_tex) != compute_brace_level(node_string):
+        # 出问题了，还原部分原文，保证括号正确
+        final_tex = join_most(final_tex, node_string)
    return final_tex

-def split_subprocess(txt, project_folder, return_dict):
+def split_subprocess(txt, project_folder, return_dict, opts):
    """
    break down latex file to a linked list,
    each node use a preserve flag to indicate whether it should
@ -202,13 +289,14 @@ def split_subprocess(txt, project_folder, return_dict):
    mask = np.zeros(len(txt), dtype=np.uint8) + TRANSFORM

    # 吸收title与作者以上的部分
-    text, mask = split_worker(text, mask, r"(.*?)\\maketitle", re.DOTALL)
+    text, mask = split_worker(text, mask, r".*?\\begin\{document\}", re.DOTALL)
    # 删除iffalse注释
    text, mask = split_worker(text, mask, r"\\iffalse(.*?)\\fi", re.DOTALL)
    # 吸收在25行以内的begin-end组合
    text, mask = split_worker_begin_end(text, mask, r"\\begin\{([a-z\*]*)\}(.*?)\\end\{\1\}", re.DOTALL, limit_n_lines=25)
    # 吸收匿名公式
    text, mask = split_worker(text, mask, r"\$\$(.*?)\$\$", re.DOTALL)
+    text, mask = split_worker(text, mask, r"\\\[.*?\\\]", re.DOTALL)
    # 吸收其他杂项
    text, mask = split_worker(text, mask, r"\\section\{(.*?)\}")
    text, mask = split_worker(text, mask, r"\\section\*\{(.*?)\}")
@ -216,6 +304,7 @@ def split_subprocess(txt, project_folder, return_dict):
    text, mask = split_worker(text, mask, r"\\subsubsection\{(.*?)\}")
    text, mask = split_worker(text, mask, r"\\bibliography\{(.*?)\}")
    text, mask = split_worker(text, mask, r"\\bibliographystyle\{(.*?)\}")
+    text, mask = split_worker(text, mask, r"\\begin\{thebibliography\}.*?\\end\{thebibliography\}", re.DOTALL)
    text, mask = split_worker(text, mask, r"\\begin\{lstlisting\}(.*?)\\end\{lstlisting\}", re.DOTALL)
    text, mask = split_worker(text, mask, r"\\begin\{wraptable\}(.*?)\\end\{wraptable\}", re.DOTALL)
    text, mask = split_worker(text, mask, r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}", re.DOTALL)
@ -235,11 +324,18 @@ def split_subprocess(txt, project_folder, return_dict):
    text, mask = split_worker(text, mask, r"\\begin\{equation\*\}(.*?)\\end\{equation\*\}", re.DOTALL)
    text, mask = split_worker(text, mask, r"\\item ")
    text, mask = split_worker(text, mask, r"\\label\{(.*?)\}")
-    text, mask = split_worker(text, mask, r"\\begin\{(.*?)\}")
    text, mask = split_worker(text, mask, r"\\vspace\{(.*?)\}")
    text, mask = split_worker(text, mask, r"\\hspace\{(.*?)\}")
+
+    text, mask = set_transform_area(text, mask, r"\\begin\{abstract\}.*?\\end\{abstract\}", re.DOTALL)
+
+    text, mask = split_worker_careful_brace(text, mask, r"\\hl(\{.*\})", re.DOTALL)
+    text, mask = split_worker_reverse_careful_brace(text, mask, r"\\caption(\{.*\})", re.DOTALL)
+    text, mask = split_worker_reverse_careful_brace(text, mask, r"\\abstract(\{.*\})", re.DOTALL)
+
+    text, mask = split_worker(text, mask, r"\\begin\{(.*?)\}")
    text, mask = split_worker(text, mask, r"\\end\{(.*?)\}")
-    # text, mask = split_worker_reverse_caption(text, mask, r"\\caption\{(.*?)\}", re.DOTALL)
+
    root = convert_to_linklist(text, mask)

    # 修复括号
@ -365,11 +461,14 @@ class LatexPaperSplit():
        if mode == 'translate_zh':
            pattern = re.compile(r'\\begin\{abstract\}.*\n')
            match = pattern.search(result_string)
+            if not match:
+                pattern = re.compile(r'\\abstract\{')
+                match = pattern.search(result_string)
            position = match.end()
            result_string = result_string[:position] + self.msg + msg + self.msg_declare + result_string[position:]
        return result_string

-    def split(self, txt, project_folder): 
+    def split(self, txt, project_folder, opts): 
        """
        break down latex file to a linked list,
        each node use a preserve flag to indicate whether it should
@ -381,7 +480,7 @@ class LatexPaperSplit():
        return_dict = manager.dict()
        p = multiprocessing.Process(
            target=split_subprocess, 
-            args=(txt, project_folder, return_dict))
+            args=(txt, project_folder, return_dict, opts))
        p.start()
        p.join()
        self.nodes = return_dict['nodes']
@ -440,7 +539,7 @@ class LatexPaperFileGroup():



-def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, mode='proofread', switch_prompt=None):
+def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, mode='proofread', switch_prompt=None, opts=[]):
    import time, os, re
    from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
    from .latex_utils import LatexPaperFileGroup, merge_tex_files, LatexPaperSplit, 寻找Latex主文件
@ -469,8 +568,10 @@ def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin
        f.write(merged_content)

    #  <-------- 精细切分latex文件 ----------> 
+    chatbot.append((f"Latex文件融合完成", f'[Local Message] 正在精细切分latex文件，这需要一段时间计算，文档越长耗时越长，请耐心等待。'))
+    yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
    lps = LatexPaperSplit()
-    res = lps.split(merged_content, project_folder) # 消耗时间的函数
+    res = lps.split(merged_content, project_folder, opts) # 消耗时间的函数

    #  <-------- 拆分过长的latex片段 ----------> 
    pfg = LatexPaperFileGroup()
@ -567,7 +668,7 @@ def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_f
    current_dir = os.getcwd()
    n_fix = 1
    max_try = 32
-    chatbot.append([f"正在编译PDF文档", f'编译已经开始。当前工作路径为{work_folder}，如果程序停顿5分钟以上，则大概率是卡死在Latex里面了。不幸卡死时请直接去该路径下取回翻译结果，或者重启之后再度尝试 ...']); yield from update_ui(chatbot=chatbot, history=history)
+    chatbot.append([f"正在编译PDF文档", f'编译已经开始。当前工作路径为{work_folder}，如果程序停顿5分钟以上，请直接去该路径下取回翻译结果，或者重启之后再度尝试 ...']); yield from update_ui(chatbot=chatbot, history=history)
    chatbot.append([f"正在编译PDF文档", '...']); yield from update_ui(chatbot=chatbot, history=history); time.sleep(1); chatbot[-1] = list(chatbot[-1]) # 刷新界面
    yield from update_ui_lastest_msg('编译已经开始...', chatbot, history)   # 刷新Gradio前端界面

--- a/request_llm/bridge_all.py
+++ b/request_llm/bridge_all.py
@ -84,6 +84,15 @@ model_info = {
        "token_cnt": get_token_num_gpt35,
    },
    
+    "gpt-3.5-turbo-16k": {
+        "fn_with_ui": chatgpt_ui,
+        "fn_without_ui": chatgpt_noui,
+        "endpoint": openai_endpoint,
+        "max_token": 1024*16,
+        "tokenizer": tokenizer_gpt35,
+        "token_cnt": get_token_num_gpt35,
+    },
+
    "gpt-4": {
        "fn_with_ui": chatgpt_ui,
        "fn_without_ui": chatgpt_noui,
--- a/4
+++ b/4
@ -1,5 +1,5 @@
 {
-  "version": 3.4,
+  "version": 3.41,
  "show_feature": true,
-  "new_feature": "新增最强Arxiv论文翻译插件 <-> 修复gradio复制按钮BUG <-> 修复PDF翻译的BUG, 新增HTML中英双栏对照 <-> 添加了OpenAI图片生成插件 <-> 添加了OpenAI音频转文本总结插件 <-> 通过Slack添加对Claude的支持"
+  "new_feature": "增加gpt-3.5-16k的支持 <-> 新增最强Arxiv论文翻译插件 <-> 修复gradio复制按钮BUG <-> 修复PDF翻译的BUG, 新增HTML中英双栏对照 <-> 添加了OpenAI图片生成插件 <-> 添加了OpenAI音频转文本总结插件 <-> 通过Slack添加对Claude的支持"
 }