Skip to content

Latest commit

 

History

History
52 lines (34 loc) · 3.46 KB

TODO_list.md

File metadata and controls

52 lines (34 loc) · 3.46 KB

This is a longer TODO List

Since I am a native Chinese speaker, I will first write this TODO List in Chinese. Later, I will find time to translate this text into English using ChatGPT.

Here, we only disclose the tasks that we obviously need to do or can do. Some of the tasks we have planned may not be released in this document.

If you complete something from the "could be done" list and it somehow gets merged into this project, we will add you to the contributor list on the page. I have also conceived a contributor community called "Silk Road", and we will add major contributors to this group.

We plan to add a "Training Wishlist". If you really intend to train a model, you can directly find untrained and interesting data from the wishlist for training and merge it into the project. This way, you will also become a direct contributor to the project.

这是一个更长的TODO List

因为我是中文母语习惯的人,我还是先用中文来写这个TODO List。之后我会找时间把这个话用ChatGPT翻译成英语。

这里我们只公开我们显然要做或者可以去做的事情,有一些我们计划中的不一定会release在这个文档。

如果你去完成"could be done"列表中的事情,并最后somehow被merge到这个项目中,我们会把你加入到页面的contributor列表中,并且我已经构思了一个贡献者社群,名字就叫做"丝绸之路",对于贡献项目的大佬我们会加入到这个群中。

我们计划增加一个"训练心愿单",如果你真的打算训练一个模型,你可以直接从心愿单上面寻找别人没有训练,且你也感兴趣的数据进行训练,并merge到项目中,这样你也会直接成为项目的贡献者。

on track

  • translate alpaca json data into Chinese
  • finetuning with lora(model 0.1)
  • release 0.1 model (model A)
  • model to hugging face, GUI demo
  • train lora with more alpaca data(model 0.3)
  • train lora with more alpaca data(model 0.9)
  • Add data about the model developer (necessary for upgrading to 1.0)
  • Fix the prompt settings for webUI ChatLuotuo and evaluation, which are inconsistent and cause the former to perform poorly.

could be done

Data 心愿单

如果你帮助我们完成Data心愿单上数据的收集(simply把json发给我也可以),我会把你加入contributor列表中。当然,简单搬运已经公开的NLP数据,并不属于这一点。心愿单上收集了一些大家社区的创意。我们暂时没有足够的时间和资金来编写prompt收集列表上的每一批数据。

  • 连续对话的数据,continuous dialogue
  • 有关IMDB TOP100甚至更多的电影、演员、情节的数据(inspired by one of issue)

模型心愿单

和上面的Data心愿单一样,我们会公开所有的社区收集,但是还没有完成训练的数据,你可以完成其中的训练,提交模型并验证通过后,我们也会把你加入到contributor列表中。

  • 现在还没有,因为没有数据