diff --git a/README.md b/README.md index 04b2f63..45048f6 100644 --- a/README.md +++ b/README.md @@ -101,9 +101,11 @@ cd gpt_academic 2. 配置API_KEY -在`config.py`中,配置API KEY等设置,[点击查看特殊网络环境设置方法](https://github.com/binary-husky/gpt_academic/issues/1) 。 +在`config.py`中,配置API KEY等设置,[点击查看特殊网络环境设置方法](https://github.com/binary-husky/gpt_academic/issues/1) 。[Wiki页面](https://github.com/binary-husky/gpt_academic/wiki/%E9%A1%B9%E7%9B%AE%E9%85%8D%E7%BD%AE%E8%AF%B4%E6%98%8E)。 -(P.S. 程序运行时会优先检查是否存在名为`config_private.py`的私密配置文件,并用其中的配置覆盖`config.py`的同名配置。因此,如果您能理解我们的配置读取逻辑,我们强烈建议您在`config.py`旁边创建一个名为`config_private.py`的新配置文件,并把`config.py`中的配置转移(复制)到`config_private.py`中(仅复制您修改过的配置条目即可)。`config_private.py`不受git管控,可以让您的隐私信息更加安全。P.S.项目同样支持通过`环境变量`配置大多数选项,环境变量的书写格式参考`docker-compose`文件。读取优先级: `环境变量` > `config_private.py` > `config.py`) +「 程序会优先检查是否存在名为`config_private.py`的私密配置文件,并用其中的配置覆盖`config.py`的同名配置。如您能理解该读取逻辑,我们强烈建议您在`config.py`旁边创建一个名为`config_private.py`的新配置文件,并把`config.py`中的配置转移(复制)到`config_private.py`中(仅复制您修改过的配置条目即可)。 」 + +「 支持通过`环境变量`配置项目,环境变量的书写格式参考`docker-compose.yml`文件或者我们的[Wiki页面](https://github.com/binary-husky/gpt_academic/wiki/%E9%A1%B9%E7%9B%AE%E9%85%8D%E7%BD%AE%E8%AF%B4%E6%98%8E)。配置读取优先级: `环境变量` > `config_private.py` > `config.py`。 」 3. 安装依赖 @@ -111,7 +113,7 @@ cd gpt_academic # (选择I: 如熟悉python)(python版本3.9以上,越新越好),备注:使用官方pip源或者阿里pip源,临时换源方法:python -m pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ python -m pip install -r requirements.txt -# (选择II: 如不熟悉python)使用anaconda,步骤也是类似的 (https://www.bilibili.com/video/BV1rc411W7Dr): +# (选择II: 使用Anaconda)步骤也是类似的 (https://www.bilibili.com/video/BV1rc411W7Dr): conda create -n gptac_venv python=3.11 # 创建anaconda环境 conda activate gptac_venv # 激活anaconda环境 python -m pip install -r requirements.txt # 这个步骤和pip安装一样的步骤 @@ -149,26 +151,25 @@ python main.py ### 安装方法II:使用Docker +0. 部署项目的全部能力(这个是包含cuda和latex的大型镜像。如果您网速慢、硬盘小或没有显卡,则不推荐使用这个,建议使用方案1)(需要熟悉[Nvidia Docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-ubuntu-and-debian)运行时) [![fullcapacity](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-all-capacity.yml/badge.svg?branch=master)](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-audio-assistant.yml) -1. 仅ChatGPT(推荐大多数人选择,等价于docker-compose方案1) +``` sh +# 修改docker-compose.yml,保留方案0并删除其他方案。修改docker-compose.yml中方案0的配置,参考其中注释即可 +docker-compose up +``` + +1. 仅ChatGPT+文心一言+spark等在线模型(推荐大多数人选择) [![basic](https://github.com/binary-husky/gpt_academic/actions/workflows/build-without-local-llms.yml/badge.svg?branch=master)](https://github.com/binary-husky/gpt_academic/actions/workflows/build-without-local-llms.yml) [![basiclatex](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-latex.yml/badge.svg?branch=master)](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-latex.yml) [![basicaudio](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-audio-assistant.yml/badge.svg?branch=master)](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-audio-assistant.yml) - ``` sh -git clone --depth=1 https://github.com/binary-husky/gpt_academic.git # 下载项目 -cd gpt_academic # 进入路径 -nano config.py # 用任意文本编辑器编辑config.py, 配置 “Proxy”, “API_KEY” 以及 “WEB_PORT” (例如50923) 等 -docker build -t gpt-academic . # 安装 - -#(最后一步-Linux操作系统)用`--net=host`更方便快捷 -docker run --rm -it --net=host gpt-academic -#(最后一步-MacOS/Windows操作系统)只能用-p选项将容器上的端口(例如50923)暴露给主机上的端口 -docker run --rm -it -e WEB_PORT=50923 -p 50923:50923 gpt-academic +# 修改docker-compose.yml,保留方案1并删除其他方案。修改docker-compose.yml中方案1的配置,参考其中注释即可 +docker-compose up ``` -P.S. 如果需要依赖Latex的插件功能,请见Wiki。另外,您也可以直接使用docker-compose获取Latex功能(修改docker-compose.yml,保留方案4并删除其他方案)。 + +P.S. 如果需要依赖Latex的插件功能,请见Wiki。另外,您也可以直接使用方案4或者方案0获取Latex功能。 2. ChatGPT + ChatGLM2 + MOSS + LLAMA2 + 通义千问(需要熟悉[Nvidia Docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-ubuntu-and-debian)运行时) [![chatglm](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-chatglm.yml/badge.svg?branch=master)](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-chatglm.yml) @@ -309,6 +310,7 @@ Tip:不指定文件直接点击 `载入对话历史存档` 可以查看历史h ### II:版本: - version 3.60(todo): 优化虚空终端,引入code interpreter和更多插件 +- version 3.53: 支持动态选择不同界面主题,提高稳定性&解决多用户冲突问题 - version 3.50: 使用自然语言调用本项目的所有函数插件(虚空终端),支持插件分类,改进UI,设计新主题 - version 3.49: 支持百度千帆平台和文心一言 - version 3.48: 支持阿里达摩院通义千问,上海AI-Lab书生,讯飞星火 diff --git a/crazy_functions/pdf_fns/parse_pdf.py b/crazy_functions/pdf_fns/parse_pdf.py index 8a7117a..a726aa2 100644 --- a/crazy_functions/pdf_fns/parse_pdf.py +++ b/crazy_functions/pdf_fns/parse_pdf.py @@ -1,6 +1,14 @@ +from functools import lru_cache +from toolbox import gen_time_str +from toolbox import promote_file_to_downloadzone +from toolbox import write_history_to_file, promote_file_to_downloadzone +from colorful import * import requests import random -from functools import lru_cache +import copy +import os +import math + class GROBID_OFFLINE_EXCEPTION(Exception): pass def get_avail_grobid_url(): @@ -28,3 +36,133 @@ def parse_pdf(pdf_path, grobid_url): raise RuntimeError("解析PDF失败,请检查PDF是否损坏。") return article_dict + +def produce_report_markdown(gpt_response_collection, meta, paper_meta_info, chatbot, fp, generated_conclusion_files): + # -=-=-=-=-=-=-=-= 写出第1个文件:翻译前后混合 -=-=-=-=-=-=-=-= + res_path = write_history_to_file(meta + ["# Meta Translation" , paper_meta_info] + gpt_response_collection, file_basename=f"{gen_time_str()}translated_and_original.md", file_fullname=None) + promote_file_to_downloadzone(res_path, rename_file=os.path.basename(res_path)+'.md', chatbot=chatbot) + generated_conclusion_files.append(res_path) + + # -=-=-=-=-=-=-=-= 写出第2个文件:仅翻译后的文本 -=-=-=-=-=-=-=-= + translated_res_array = [] + # 记录当前的大章节标题: + last_section_name = "" + for index, value in enumerate(gpt_response_collection): + # 先挑选偶数序列号: + if index % 2 != 0: + # 先提取当前英文标题: + cur_section_name = gpt_response_collection[index-1].split('\n')[0].split(" Part")[0] + # 如果index是1的话,则直接使用first section name: + if cur_section_name != last_section_name: + cur_value = cur_section_name + '\n' + last_section_name = copy.deepcopy(cur_section_name) + else: + cur_value = "" + # 再做一个小修改:重新修改当前part的标题,默认用英文的 + cur_value += value + translated_res_array.append(cur_value) + res_path = write_history_to_file(meta + ["# Meta Translation" , paper_meta_info] + translated_res_array, + file_basename = f"{gen_time_str()}-translated_only.md", + file_fullname = None, + auto_caption = False) + promote_file_to_downloadzone(res_path, rename_file=os.path.basename(res_path)+'.md', chatbot=chatbot) + generated_conclusion_files.append(res_path) + return res_path + +def translate_pdf(article_dict, llm_kwargs, chatbot, fp, generated_conclusion_files, TOKEN_LIMIT_PER_FRAGMENT, DST_LANG): + from crazy_functions.crazy_utils import construct_html + from crazy_functions.crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf + from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive + from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency + + prompt = "以下是一篇学术论文的基本信息:\n" + # title + title = article_dict.get('title', '无法获取 title'); prompt += f'title:{title}\n\n' + # authors + authors = article_dict.get('authors', '无法获取 authors'); prompt += f'authors:{authors}\n\n' + # abstract + abstract = article_dict.get('abstract', '无法获取 abstract'); prompt += f'abstract:{abstract}\n\n' + # command + prompt += f"请将题目和摘要翻译为{DST_LANG}。" + meta = [f'# Title:\n\n', title, f'# Abstract:\n\n', abstract ] + + # 单线,获取文章meta信息 + paper_meta_info = yield from request_gpt_model_in_new_thread_with_ui_alive( + inputs=prompt, + inputs_show_user=prompt, + llm_kwargs=llm_kwargs, + chatbot=chatbot, history=[], + sys_prompt="You are an academic paper reader。", + ) + + # 多线,翻译 + inputs_array = [] + inputs_show_user_array = [] + + # get_token_num + from request_llm.bridge_all import model_info + enc = model_info[llm_kwargs['llm_model']]['tokenizer'] + def get_token_num(txt): return len(enc.encode(txt, disallowed_special=())) + + def break_down(txt): + raw_token_num = get_token_num(txt) + if raw_token_num <= TOKEN_LIMIT_PER_FRAGMENT: + return [txt] + else: + # raw_token_num > TOKEN_LIMIT_PER_FRAGMENT + # find a smooth token limit to achieve even seperation + count = int(math.ceil(raw_token_num / TOKEN_LIMIT_PER_FRAGMENT)) + token_limit_smooth = raw_token_num // count + count + return breakdown_txt_to_satisfy_token_limit_for_pdf(txt, get_token_fn=get_token_num, limit=token_limit_smooth) + + for section in article_dict.get('sections'): + if len(section['text']) == 0: continue + section_frags = break_down(section['text']) + for i, fragment in enumerate(section_frags): + heading = section['heading'] + if len(section_frags) > 1: heading += f' Part-{i+1}' + inputs_array.append( + f"你需要翻译{heading}章节,内容如下: \n\n{fragment}" + ) + inputs_show_user_array.append( + f"# {heading}\n\n{fragment}" + ) + + gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency( + inputs_array=inputs_array, + inputs_show_user_array=inputs_show_user_array, + llm_kwargs=llm_kwargs, + chatbot=chatbot, + history_array=[meta for _ in inputs_array], + sys_prompt_array=[ + "请你作为一个学术翻译,负责把学术论文准确翻译成中文。注意文章中的每一句话都要翻译。" for _ in inputs_array], + ) + # -=-=-=-=-=-=-=-= 写出Markdown文件 -=-=-=-=-=-=-=-= + produce_report_markdown(gpt_response_collection, meta, paper_meta_info, chatbot, fp, generated_conclusion_files) + + # -=-=-=-=-=-=-=-= 写出HTML文件 -=-=-=-=-=-=-=-= + ch = construct_html() + orig = "" + trans = "" + gpt_response_collection_html = copy.deepcopy(gpt_response_collection) + for i,k in enumerate(gpt_response_collection_html): + if i%2==0: + gpt_response_collection_html[i] = inputs_show_user_array[i//2] + else: + # 先提取当前英文标题: + cur_section_name = gpt_response_collection[i-1].split('\n')[0].split(" Part")[0] + cur_value = cur_section_name + "\n" + gpt_response_collection_html[i] + gpt_response_collection_html[i] = cur_value + + final = ["", "", "一、论文概况", "", "Abstract", paper_meta_info, "二、论文翻译", ""] + final.extend(gpt_response_collection_html) + for i, k in enumerate(final): + if i%2==0: + orig = k + if i%2==1: + trans = k + ch.add_row(a=orig, b=trans) + create_report_file_name = f"{os.path.basename(fp)}.trans.html" + html_file = ch.save_file(create_report_file_name) + generated_conclusion_files.append(html_file) + promote_file_to_downloadzone(html_file, rename_file=os.path.basename(html_file), chatbot=chatbot) diff --git a/crazy_functions/批量翻译PDF文档_NOUGAT.py b/crazy_functions/批量翻译PDF文档_NOUGAT.py index fc814b9..c0961c1 100644 --- a/crazy_functions/批量翻译PDF文档_NOUGAT.py +++ b/crazy_functions/批量翻译PDF文档_NOUGAT.py @@ -1,11 +1,12 @@ -from toolbox import CatchException, report_execption, gen_time_str +from toolbox import CatchException, report_execption, get_log_folder, gen_time_str from toolbox import update_ui, promote_file_to_downloadzone, update_ui_lastest_msg, disable_auto_promotion -from toolbox import write_history_to_file, get_log_folder +from toolbox import write_history_to_file, promote_file_to_downloadzone from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency from .crazy_utils import read_and_clean_pdf_text -from .pdf_fns.parse_pdf import parse_pdf, get_avail_grobid_url +from .pdf_fns.parse_pdf import parse_pdf, get_avail_grobid_url, translate_pdf from colorful import * +import copy import os import math import logging @@ -92,7 +93,7 @@ def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst def 解析PDF_基于NOUGAT(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt): import copy import tiktoken - TOKEN_LIMIT_PER_FRAGMENT = 1280 + TOKEN_LIMIT_PER_FRAGMENT = 1024 generated_conclusion_files = [] generated_html_files = [] DST_LANG = "中文" @@ -106,96 +107,7 @@ def 解析PDF_基于NOUGAT(file_manifest, project_folder, llm_kwargs, plugin_kwa article_content = f.readlines() article_dict = markdown_to_dict(article_content) logging.info(article_dict) - - prompt = "以下是一篇学术论文的基本信息:\n" - # title - title = article_dict.get('title', '无法获取 title'); prompt += f'title:{title}\n\n' - # authors - authors = article_dict.get('authors', '无法获取 authors'); prompt += f'authors:{authors}\n\n' - # abstract - abstract = article_dict.get('abstract', '无法获取 abstract'); prompt += f'abstract:{abstract}\n\n' - # command - prompt += f"请将题目和摘要翻译为{DST_LANG}。" - meta = [f'# Title:\n\n', title, f'# Abstract:\n\n', abstract ] - - # 单线,获取文章meta信息 - paper_meta_info = yield from request_gpt_model_in_new_thread_with_ui_alive( - inputs=prompt, - inputs_show_user=prompt, - llm_kwargs=llm_kwargs, - chatbot=chatbot, history=[], - sys_prompt="You are an academic paper reader。", - ) - - # 多线,翻译 - inputs_array = [] - inputs_show_user_array = [] - - # get_token_num - from request_llm.bridge_all import model_info - enc = model_info[llm_kwargs['llm_model']]['tokenizer'] - def get_token_num(txt): return len(enc.encode(txt, disallowed_special=())) - from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf - - def break_down(txt): - raw_token_num = get_token_num(txt) - if raw_token_num <= TOKEN_LIMIT_PER_FRAGMENT: - return [txt] - else: - # raw_token_num > TOKEN_LIMIT_PER_FRAGMENT - # find a smooth token limit to achieve even seperation - count = int(math.ceil(raw_token_num / TOKEN_LIMIT_PER_FRAGMENT)) - token_limit_smooth = raw_token_num // count + count - return breakdown_txt_to_satisfy_token_limit_for_pdf(txt, get_token_fn=get_token_num, limit=token_limit_smooth) - - for section in article_dict.get('sections'): - if len(section['text']) == 0: continue - section_frags = break_down(section['text']) - for i, fragment in enumerate(section_frags): - heading = section['heading'] - if len(section_frags) > 1: heading += f' Part-{i+1}' - inputs_array.append( - f"你需要翻译{heading}章节,内容如下: \n\n{fragment}" - ) - inputs_show_user_array.append( - f"# {heading}\n\n{fragment}" - ) - - gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency( - inputs_array=inputs_array, - inputs_show_user_array=inputs_show_user_array, - llm_kwargs=llm_kwargs, - chatbot=chatbot, - history_array=[meta for _ in inputs_array], - sys_prompt_array=[ - "请你作为一个学术翻译,负责把学术论文准确翻译成中文。注意文章中的每一句话都要翻译。" for _ in inputs_array], - ) - res_path = write_history_to_file(meta + ["# Meta Translation" , paper_meta_info] + gpt_response_collection, file_basename=None, file_fullname=None) - promote_file_to_downloadzone(res_path, rename_file=os.path.basename(fp)+'.md', chatbot=chatbot) - generated_conclusion_files.append(res_path) - - ch = construct_html() - orig = "" - trans = "" - gpt_response_collection_html = copy.deepcopy(gpt_response_collection) - for i,k in enumerate(gpt_response_collection_html): - if i%2==0: - gpt_response_collection_html[i] = inputs_show_user_array[i//2] - else: - gpt_response_collection_html[i] = gpt_response_collection_html[i] - - final = ["", "", "一、论文概况", "", "Abstract", paper_meta_info, "二、论文翻译", ""] - final.extend(gpt_response_collection_html) - for i, k in enumerate(final): - if i%2==0: - orig = k - if i%2==1: - trans = k - ch.add_row(a=orig, b=trans) - create_report_file_name = f"{os.path.basename(fp)}.trans.html" - html_file = ch.save_file(create_report_file_name) - generated_html_files.append(html_file) - promote_file_to_downloadzone(html_file, rename_file=os.path.basename(html_file), chatbot=chatbot) + yield from translate_pdf(article_dict, llm_kwargs, chatbot, fp, generated_conclusion_files, TOKEN_LIMIT_PER_FRAGMENT, DST_LANG) chatbot.append(("给出输出文件清单", str(generated_conclusion_files + generated_html_files))) yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 diff --git a/crazy_functions/批量翻译PDF文档_多线程.py b/crazy_functions/批量翻译PDF文档_多线程.py index d620715..0f60a90 100644 --- a/crazy_functions/批量翻译PDF文档_多线程.py +++ b/crazy_functions/批量翻译PDF文档_多线程.py @@ -1,12 +1,12 @@ -from toolbox import CatchException, report_execption, get_log_folder +from toolbox import CatchException, report_execption, get_log_folder, gen_time_str from toolbox import update_ui, promote_file_to_downloadzone, update_ui_lastest_msg, disable_auto_promotion from toolbox import write_history_to_file, promote_file_to_downloadzone from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency from .crazy_utils import read_and_clean_pdf_text -from .pdf_fns.parse_pdf import parse_pdf, get_avail_grobid_url +from .pdf_fns.parse_pdf import parse_pdf, get_avail_grobid_url, translate_pdf from colorful import * -import glob +import copy import os import math @@ -58,8 +58,8 @@ def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst def 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, grobid_url): - import copy - TOKEN_LIMIT_PER_FRAGMENT = 1280 + import copy, json + TOKEN_LIMIT_PER_FRAGMENT = 1024 generated_conclusion_files = [] generated_html_files = [] DST_LANG = "中文" @@ -67,104 +67,23 @@ def 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwa for index, fp in enumerate(file_manifest): chatbot.append(["当前进度:", f"正在连接GROBID服务,请稍候: {grobid_url}\n如果等待时间过长,请修改config中的GROBID_URL,可修改成本地GROBID服务。"]); yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 article_dict = parse_pdf(fp, grobid_url) + grobid_json_res = os.path.join(get_log_folder(), gen_time_str() + "grobid.json") + with open(grobid_json_res, 'w+', encoding='utf8') as f: + f.write(json.dumps(article_dict, indent=4, ensure_ascii=False)) + promote_file_to_downloadzone(grobid_json_res, chatbot=chatbot) + if article_dict is None: raise RuntimeError("解析PDF失败,请检查PDF是否损坏。") - prompt = "以下是一篇学术论文的基本信息:\n" - # title - title = article_dict.get('title', '无法获取 title'); prompt += f'title:{title}\n\n' - # authors - authors = article_dict.get('authors', '无法获取 authors'); prompt += f'authors:{authors}\n\n' - # abstract - abstract = article_dict.get('abstract', '无法获取 abstract'); prompt += f'abstract:{abstract}\n\n' - # command - prompt += f"请将题目和摘要翻译为{DST_LANG}。" - meta = [f'# Title:\n\n', title, f'# Abstract:\n\n', abstract ] - - # 单线,获取文章meta信息 - paper_meta_info = yield from request_gpt_model_in_new_thread_with_ui_alive( - inputs=prompt, - inputs_show_user=prompt, - llm_kwargs=llm_kwargs, - chatbot=chatbot, history=[], - sys_prompt="You are an academic paper reader。", - ) - - # 多线,翻译 - inputs_array = [] - inputs_show_user_array = [] - - # get_token_num - from request_llm.bridge_all import model_info - enc = model_info[llm_kwargs['llm_model']]['tokenizer'] - def get_token_num(txt): return len(enc.encode(txt, disallowed_special=())) - from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf - - def break_down(txt): - raw_token_num = get_token_num(txt) - if raw_token_num <= TOKEN_LIMIT_PER_FRAGMENT: - return [txt] - else: - # raw_token_num > TOKEN_LIMIT_PER_FRAGMENT - # find a smooth token limit to achieve even seperation - count = int(math.ceil(raw_token_num / TOKEN_LIMIT_PER_FRAGMENT)) - token_limit_smooth = raw_token_num // count + count - return breakdown_txt_to_satisfy_token_limit_for_pdf(txt, get_token_fn=get_token_num, limit=token_limit_smooth) - - for section in article_dict.get('sections'): - if len(section['text']) == 0: continue - section_frags = break_down(section['text']) - for i, fragment in enumerate(section_frags): - heading = section['heading'] - if len(section_frags) > 1: heading += f' Part-{i+1}' - inputs_array.append( - f"你需要翻译{heading}章节,内容如下: \n\n{fragment}" - ) - inputs_show_user_array.append( - f"# {heading}\n\n{fragment}" - ) - - gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency( - inputs_array=inputs_array, - inputs_show_user_array=inputs_show_user_array, - llm_kwargs=llm_kwargs, - chatbot=chatbot, - history_array=[meta for _ in inputs_array], - sys_prompt_array=[ - "请你作为一个学术翻译,负责把学术论文准确翻译成中文。注意文章中的每一句话都要翻译。" for _ in inputs_array], - ) - res_path = write_history_to_file(meta + ["# Meta Translation" , paper_meta_info] + gpt_response_collection, file_basename=None, file_fullname=None) - promote_file_to_downloadzone(res_path, rename_file=os.path.basename(fp)+'.md', chatbot=chatbot) - generated_conclusion_files.append(res_path) - - ch = construct_html() - orig = "" - trans = "" - gpt_response_collection_html = copy.deepcopy(gpt_response_collection) - for i,k in enumerate(gpt_response_collection_html): - if i%2==0: - gpt_response_collection_html[i] = inputs_show_user_array[i//2] - else: - gpt_response_collection_html[i] = gpt_response_collection_html[i] - - final = ["", "", "一、论文概况", "", "Abstract", paper_meta_info, "二、论文翻译", ""] - final.extend(gpt_response_collection_html) - for i, k in enumerate(final): - if i%2==0: - orig = k - if i%2==1: - trans = k - ch.add_row(a=orig, b=trans) - create_report_file_name = f"{os.path.basename(fp)}.trans.html" - html_file = ch.save_file(create_report_file_name) - generated_html_files.append(html_file) - promote_file_to_downloadzone(html_file, rename_file=os.path.basename(html_file), chatbot=chatbot) - + yield from translate_pdf(article_dict, llm_kwargs, chatbot, fp, generated_conclusion_files, TOKEN_LIMIT_PER_FRAGMENT, DST_LANG) chatbot.append(("给出输出文件清单", str(generated_conclusion_files + generated_html_files))) yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt): + """ + 此函数已经弃用 + """ import copy - TOKEN_LIMIT_PER_FRAGMENT = 1280 + TOKEN_LIMIT_PER_FRAGMENT = 1024 generated_conclusion_files = [] generated_html_files = [] from crazy_functions.crazy_utils import construct_html diff --git a/docker-compose.yml b/docker-compose.yml index 2387527..bfe0c28 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1,5 +1,54 @@ #【请修改完参数后,删除此行】请在以下方案中选择一种,然后删除其他的方案,最后docker-compose up运行 | Please choose from one of these options below, delete other options as well as This Line +## =================================================== +## 【方案零】 部署项目的全部能力(这个是包含cuda和latex的大型镜像。如果您网速慢、硬盘小或没有显卡,则不推荐使用这个) +## =================================================== +version: '3' +services: + gpt_academic_full_capability: + image: ghcr.io/binary-husky/gpt_academic_with_all_capacity:master + environment: + # 请查阅 `config.py`或者 github wiki 以查看所有的配置信息 + API_KEY: ' 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 ' + # USE_PROXY: ' True ' + # proxies: ' { "http": "http://localhost:10881", "https": "http://localhost:10881", } ' + LLM_MODEL: ' gpt-3.5-turbo ' + AVAIL_LLM_MODELS: ' ["gpt-3.5-turbo", "gpt-4", "qianfan", "sparkv2", "spark", "chatglm"] ' + BAIDU_CLOUD_API_KEY : ' bTUtwEAveBrQipEowUvDwYWq ' + BAIDU_CLOUD_SECRET_KEY : ' jqXtLvXiVw6UNdjliATTS61rllG8Iuni ' + XFYUN_APPID: ' 53a8d816 ' + XFYUN_API_SECRET: ' MjMxNDQ4NDE4MzM0OSNlNjQ2NTlhMTkx ' + XFYUN_API_KEY: ' 95ccdec285364869d17b33e75ee96447 ' + ENABLE_AUDIO: ' False ' + DEFAULT_WORKER_NUM: ' 20 ' + WEB_PORT: ' 12345 ' + ADD_WAIFU: ' False ' + ALIYUN_APPKEY: ' RxPlZrM88DnAFkZK ' + THEME: ' Chuanhu-Small-and-Beautiful ' + ALIYUN_ACCESSKEY: ' LTAI5t6BrFUzxRXVGUWnekh1 ' + ALIYUN_SECRET: ' eHmI20SVWIwQZxCiTD2bGQVspP9i68 ' + # LOCAL_MODEL_DEVICE: ' cuda ' + + # 加载英伟达显卡运行时 + # runtime: nvidia + # deploy: + # resources: + # reservations: + # devices: + # - driver: nvidia + # count: 1 + # capabilities: [gpu] + + # 与宿主的网络融合 + network_mode: "host" + + # 不使用代理网络拉取最新代码 + command: > + bash -c "python3 -u main.py" + + + + ## =================================================== ## 【方案一】 如果不需要运行本地模型(仅 chatgpt, azure, 星火, 千帆, claude 等在线大模型服务) ## =================================================== diff --git a/themes/common.css b/themes/common.css index 5880d00..0f201f0 100644 --- a/themes/common.css +++ b/themes/common.css @@ -19,3 +19,8 @@ .wrap.svelte-xwlu1w { min-height: var(--size-32); } + +/* status bar height */ +.min.svelte-1yrv54 { + min-height: var(--size-12); +} \ No newline at end of file diff --git a/toolbox.py b/toolbox.py index 1452c13..6a53868 100644 --- a/toolbox.py +++ b/toolbox.py @@ -216,7 +216,7 @@ def get_reduce_token_percent(text): return 0.5, '不详' -def write_history_to_file(history, file_basename=None, file_fullname=None): +def write_history_to_file(history, file_basename=None, file_fullname=None, auto_caption=True): """ 将对话记录history以Markdown格式写入文件中。如果没有指定文件名,则使用当前时间生成文件名。 """ @@ -235,7 +235,7 @@ def write_history_to_file(history, file_basename=None, file_fullname=None): if type(content) != str: content = str(content) except: continue - if i % 2 == 0: + if i % 2 == 0 and auto_caption: f.write('## ') try: f.write(content) diff --git a/version b/version index 897b042..5562088 100644 --- a/version +++ b/version @@ -1,5 +1,5 @@ { - "version": 3.52, + "version": 3.53, "show_feature": true, - "new_feature": "提高稳定性&解决多用户冲突问题 <-> 支持插件分类和更多UI皮肤外观 <-> 支持用户使用自然语言调度各个插件(虚空终端) ! <-> 改进UI,设计新主题 <-> 支持借助GROBID实现PDF高精度翻译 <-> 接入百度千帆平台和文心一言 <-> 接入阿里通义千问、讯飞星火、上海AI-Lab书生 <-> 优化一键升级 <-> 提高arxiv翻译速度和成功率" + "new_feature": "支持动态选择不同界面主题 <-> 提高稳定性&解决多用户冲突问题 <-> 支持插件分类和更多UI皮肤外观 <-> 支持用户使用自然语言调度各个插件(虚空终端) ! <-> 改进UI,设计新主题 <-> 支持借助GROBID实现PDF高精度翻译 <-> 接入百度千帆平台和文心一言 <-> 接入阿里通义千问、讯飞星火、上海AI-Lab书生 <-> 优化一键升级 <-> 提高arxiv翻译速度和成功率" }