Merge pull request #6 from dblate/patch-4

Update 2.处理文本数据.md
This commit is contained in:
long_long_ago 2025-03-16 16:37:09 +08:00 committed by GitHub
commit f33b91bf60
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 4 additions and 1 deletions

View File

@ -295,7 +295,10 @@ ids = tokenizer.encode(text)
print(ids) print(ids)
``` ```
上面的代码打印出以下token ID`这里原始英文书籍中没有输出打印结果,读者可以自己运行代码查看结果` 上面的代码打印出以下token ID
```
[1, 56, 2, 850, 988, 602, 533, 746, 5, 1126, 596, 5, 1, 67, 7, 38, 851, 1108, 754, 793, 7]
```
接下来,让我们看看能否通过 decode 方法将这些token ID 转换回文本: 接下来,让我们看看能否通过 decode 方法将这些token ID 转换回文本: