binary-husky c3140ce344
merge frontier branch (#1620)
* Zhipu sdk update 适配最新的智谱SDK,支持GLM4v (#1502)

* 适配 google gemini 优化为从用户input中提取文件

* 适配最新的智谱SDK、支持glm-4v

* requirements.txt fix

* pending history check

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* Update "生成多种Mermaid图表" plugin: Separate out the file reading function (#1520)

* Update crazy_functional.py with new functionality deal with PDF

* Update crazy_functional.py and Mermaid.py for plugin_kwargs

* Update crazy_functional.py with new chart type: mind map

* Update SELECT_PROMPT and i_say_show_user messages

* Update ArgsReminder message in get_crazy_functions() function

* Update with read md file and update PROMPTS

* Return the PROMPTS as the test found that the initial version worked best

* Update Mermaid chart generation function

* version 3.71

* 解决issues #1510

* Remove unnecessary text from sys_prompt in 解析历史输入 function

* Remove sys_prompt message in 解析历史输入 function

* Update bridge_all.py: supports gpt-4-turbo-preview (#1517)

* Update bridge_all.py: supports gpt-4-turbo-preview

supports gpt-4-turbo-preview

* Update bridge_all.py

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* Update config.py: supports gpt-4-turbo-preview (#1516)

* Update config.py: supports gpt-4-turbo-preview

supports gpt-4-turbo-preview

* Update config.py

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* Refactor 解析历史输入 function to handle file input

* Update Mermaid chart generation functionality

* rename files and functions

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
Co-authored-by: hongyi-zhao <hongyi.zhao@gmail.com>
Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* 接入mathpix ocr功能 (#1468)

* Update Latex输出PDF结果.py

借助mathpix实现了PDF翻译中文并重新编译PDF

* Update config.py

add mathpix appid & appkey

* Add 'PDF翻译中文并重新编译PDF' feature to plugins.

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* fix zhipuai

* check picture

* remove glm-4 due to bug

* 修改config

* 检查MATHPIX_APPID

* Remove unnecessary code and update
function_plugins dictionary

* capture non-standard token overflow

* bug fix #1524

* change mermaid style

* 支持mermaid 滚动放大缩小重置,鼠标滚动和拖拽 (#1530)

* 支持mermaid 滚动放大缩小重置,鼠标滚动和拖拽

* 微调未果 先stage一下

* update

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* ver 3.72

* change live2d

* save the status of ``clear btn` in cookie

* 前端选择保持

* js ui bug fix

* reset btn bug fix

* update live2d tips

* fix missing get_token_num method

* fix live2d toggle switch

* fix persistent custom btn with cookie

* fix zhipuai feedback with core functionality

* Refactor button update and clean up functions

* tailing space removal

* Fix missing MATHPIX_APPID and MATHPIX_APPKEY
configuration

* Prompt fix、脑图提示词优化 (#1537)

* 适配 google gemini 优化为从用户input中提取文件

* 脑图提示词优化

* Fix missing MATHPIX_APPID and MATHPIX_APPKEY
configuration

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* 优化“PDF翻译中文并重新编译PDF”插件 (#1602)

* Add gemini_endpoint to API_URL_REDIRECT (#1560)

* Add gemini_endpoint to API_URL_REDIRECT

* Update gemini-pro and gemini-pro-vision model_info
endpoints

* Update to support new claude models (#1606)

* Add anthropic library and update claude models

* 更新bridge_claude.py文件,添加了对图片输入的支持。修复了一些bug。

* 添加Claude_3_Models变量以限制图片数量

* Refactor code to improve readability and
maintainability

* minor claude bug fix

* more flexible one-api support

* reformat config

* fix one-api new access bug

* dummy

* compat non-standard api

* version 3.73

---------

Co-authored-by: XIao <46100050+Kilig947@users.noreply.github.com>
Co-authored-by: Menghuan1918 <menghuan2003@outlook.com>
Co-authored-by: hongyi-zhao <hongyi.zhao@gmail.com>
Co-authored-by: Hao Ma <893017927@qq.com>
Co-authored-by: zeyuan huang <599012428@qq.com>
2024-03-11 17:26:09 +08:00

230 lines
7.6 KiB
Python
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ------------------------------------------------------------------------------------------------------------------------
# 🔌💻 Source Code From https://huggingface.co/K024/ChatGLM-6b-onnx-u8s8/blob/main/model.py
# ------------------------------------------------------------------------------------------------------------------------
import re
import numpy as np
# import torch
from onnxruntime import InferenceSession, SessionOptions
# Currently `MatMulInteger` and `DynamicQuantizeLinear` are only supported on CPU,
# although they are documented as supported on CUDA.
providers = ["CPUExecutionProvider"]
# if torch.cuda.is_available():
# providers = ["CUDAExecutionProvider"] + providers
# Default paths
tokenizer_path = "chatglm-6b-int8-onnx-merged/sentencepiece.model"
onnx_model_path = "chatglm-6b-int8-onnx-merged/chatglm-6b-int8.onnx"
# input & output names
past_names = [f"past_{name}_{i}" for i in range(28) for name in ["key", "value"]]
present_names = [f"present_{name}_{i}" for i in range(28) for name in ["key", "value"]]
output_names = ["logits"] + present_names
# default kv_cache for first inference
default_past_key_values = {
k: np.zeros((1, 0, 32, 128), dtype=np.float32) for k in past_names
}
def chat_template(history: list[tuple[str, str]], current: str):
prompt = ""
chat_round = 0
for question, answer in history:
prompt += f"[Round {chat_round}]\n问:{question}\n答:{answer}\n"
chat_round += 1
prompt += f"[Round {chat_round}]\n问:{current}\n答:"
return prompt
def process_response(response: str):
response = response.strip()
response = response.replace("[[训练时间]]", "2023年")
punkts = [
[",", ""],
["!", ""],
[":", ""],
[";", ""],
["\?", ""],
]
for item in punkts:
response = re.sub(r"([\u4e00-\u9fff])%s" % item[0], r"\1%s" % item[1], response)
response = re.sub(r"%s([\u4e00-\u9fff])" % item[0], r"%s\1" % item[1], response)
return response
class ChatGLMModel():
def __init__(self, onnx_model_path=onnx_model_path, tokenizer_path=tokenizer_path, profile=False) -> None:
self.tokenizer = ChatGLMTokenizer(tokenizer_path)
options = SessionOptions()
options.enable_profiling = profile
self.session = InferenceSession(onnx_model_path, options, providers=providers)
self.eop_token_id = self.tokenizer["<eop>"]
def prepare_input(self, prompt: str):
input_ids, prefix_mask = self.tokenizer.encode(prompt)
input_ids = np.array([input_ids], dtype=np.longlong)
prefix_mask = np.array([prefix_mask], dtype=np.longlong)
return input_ids, prefix_mask, default_past_key_values
def sample_next_token(self, logits: np.ndarray, top_k=50, top_p=0.7, temperature=1):
# softmax with temperature
exp_logits = np.exp(logits / temperature)
probs = exp_logits / np.sum(exp_logits)
# top k
top_k_idx = np.argsort(-probs)[:top_k]
top_k_probs = probs[top_k_idx]
# top p
cumsum_probs = np.cumsum(top_k_probs)
top_k_probs[(cumsum_probs - top_k_probs) > top_p] = 0.0
top_k_probs = top_k_probs / np.sum(top_k_probs)
# sample
next_token = np.random.choice(top_k_idx, size=1, p=top_k_probs)
return next_token[0].item()
def generate_iterate(self, prompt: str, max_generated_tokens=100, top_k=50, top_p=0.7, temperature=1):
input_ids, prefix_mask, past_key_values = self.prepare_input(prompt)
output_tokens = []
while True:
inputs = {
"input_ids": input_ids,
"prefix_mask": prefix_mask,
"use_past": np.array(len(output_tokens) > 0),
}
inputs.update(past_key_values)
logits, *past_key_values = self.session.run(output_names, inputs)
past_key_values = { k: v for k, v in zip(past_names, past_key_values) }
next_token = self.sample_next_token(logits[0, -1], top_k=top_k, top_p=top_p, temperature=temperature)
output_tokens += [next_token]
if next_token == self.eop_token_id or len(output_tokens) > max_generated_tokens:
break
input_ids = np.array([[next_token]], dtype=np.longlong)
prefix_mask = np.concatenate([prefix_mask, np.array([[0]], dtype=np.longlong)], axis=1)
yield process_response(self.tokenizer.decode(output_tokens))
return process_response(self.tokenizer.decode(output_tokens))
# ------------------------------------------------------------------------------------------------------------------------
# 🔌💻 Source Code From https://huggingface.co/K024/ChatGLM-6b-onnx-u8s8/blob/main/tokenizer.py
# ------------------------------------------------------------------------------------------------------------------------
import re
from sentencepiece import SentencePieceProcessor
def replace_spaces_with_blank(match: re.Match[str]):
return f"<|blank_{len(match.group())}|>"
def replace_blank_with_spaces(match: re.Match[str]):
return " " * int(match.group(1))
class ChatGLMTokenizer:
def __init__(self, vocab_file):
assert vocab_file is not None
self.vocab_file = vocab_file
self.special_tokens = ["[MASK]", "[gMASK]", "[sMASK]", "<unused_0>", "<sop>", "<eop>", "<ENC>", "<dBLOCK>"]
self.text_tokenizer = SentencePieceProcessor(str(vocab_file))
def __len__(self):
return len(self.text_tokenizer)
def __getitem__(self, key: str):
return self.text_tokenizer[key]
def preprocess(self, text: str, linebreak=True, whitespaces=True):
if linebreak:
text = text.replace("\n", "<n>")
if whitespaces:
text = text.replace("\t", "<|tab|>")
text = re.sub(r" {2,80}", replace_spaces_with_blank, text)
return text
def encode(
self, text: str, text_pair: str = None,
linebreak=True, whitespaces=True,
add_dummy_prefix=True, special_tokens=True,
) -> tuple[list[int], list[int]]:
"""
text: Text to encode. Bidirectional part with a [gMASK] and an <sop> for causal LM.
text_pair: causal LM part.
linebreak: Whether to encode newline (\n) in text.
whitespaces: Whether to encode multiple whitespaces or tab in text, useful for source code encoding.
special_tokens: Whether to encode special token ([MASK], [gMASK], etc.) in text.
add_dummy_prefix: Whether to add dummy blank space in the beginning.
"""
text = self.preprocess(text, linebreak, whitespaces)
if not add_dummy_prefix:
text = "<n>" + text
tokens = self.text_tokenizer.encode(text)
prefix_mask = [1] * len(tokens)
if special_tokens:
tokens += [self.text_tokenizer["[gMASK]"], self.text_tokenizer["<sop>"]]
prefix_mask += [1, 0]
if text_pair is not None:
text_pair = self.preprocess(text_pair, linebreak, whitespaces)
pair_tokens = self.text_tokenizer.encode(text_pair)
tokens += pair_tokens
prefix_mask += [0] * len(pair_tokens)
if special_tokens:
tokens += [self.text_tokenizer["<eop>"]]
prefix_mask += [0]
return (tokens if add_dummy_prefix else tokens[2:]), prefix_mask
def decode(self, text_ids: list[int]) -> str:
text = self.text_tokenizer.decode(text_ids)
text = text.replace("<n>", "\n")
text = text.replace("<|tab|>", "\t")
text = re.sub(r"<\|blank_(\d\d?)\|>", replace_blank_with_spaces, text)
return text