Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

是否会有deepspeed加速训练和推理过程呢? #46

Open
heyday111 opened this issue Apr 17, 2023 · 5 comments
Open

是否会有deepspeed加速训练和推理过程呢? #46

heyday111 opened this issue Apr 17, 2023 · 5 comments

Comments

@heyday111
Copy link

现在用tuoling摘要,每个对话需要运行15s才能有结果,希望后续能推出多gpu加速后的推理。

@LC1332
Copy link
Owner

LC1332 commented Apr 17, 2023

您是用A100还是T4 啊。A100应该会快不少,我们后面会去研究模型并行,谢谢您的意见- -!

@LC1332
Copy link
Owner

LC1332 commented Apr 17, 2023

其实主要是这样的,GLM没有被移到huggingface标准管线,如果移动进去,应该可以用accelerator直接去加速。 我后面想看看别的hf管线内的模型能不能做这个,我觉得您这个讨论挺有意义的。

@heyday111
Copy link
Author

我是用V100 32G的,加载完会大概需要14G的显存,我尝试用了chatglm 的deepspeed支持,但似乎底层代码还不是很支持在luotuo上做inference,加载到多个显存后会报错“同一份数据无法在两个显存加载”。

@heyday111
Copy link
Author

请教一下还有什么加速的方法呢?比如模型量化后是否能够加速推理呢。

@LC1332
Copy link
Owner

LC1332 commented Apr 18, 2023

请教一下还有什么加速的方法呢?比如模型量化后是否能够加速推理呢。

我今天看到一个很惊人的工作 https://zhuanlan.zhihu.com/p/622754642 High-throughput Generative Inference of Large Language Models with a Single GPU 但是感觉想整合这个的码量不小

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants