Update 2.处理文本数据.md

This commit is contained in:
yuhui 2025-03-11 22:48:04 +08:00 committed by GitHub
parent e95b854405
commit bec6495095
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 4 additions and 1 deletions

View File

@ -295,7 +295,10 @@ ids = tokenizer.encode(text)
print(ids)
```
上面的代码打印出以下token ID`这里原始英文书籍中没有输出打印结果,读者可以自己运行代码查看结果`
上面的代码打印出以下token ID
```
[1, 56, 2, 850, 988, 602, 533, 746, 5, 1126, 596, 5, 1, 67, 7, 38, 851, 1108, 754, 793, 7]
```
接下来,让我们看看能否通过 decode 方法将这些token ID 转换回文本: