From bec6495095cae9880c1e1d5cc9d53ceae7120bc9 Mon Sep 17 00:00:00 2001 From: yuhui <173983476@qq.com> Date: Tue, 11 Mar 2025 22:48:04 +0800 Subject: [PATCH] =?UTF-8?q?Update=202.=E5=A4=84=E7=90=86=E6=96=87=E6=9C=AC?= =?UTF-8?q?=E6=95=B0=E6=8D=AE.md?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- cn-Book/2.处理文本数据.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/cn-Book/2.处理文本数据.md b/cn-Book/2.处理文本数据.md index abf92b9..7826b9f 100644 --- a/cn-Book/2.处理文本数据.md +++ b/cn-Book/2.处理文本数据.md @@ -295,7 +295,10 @@ ids = tokenizer.encode(text) print(ids) ``` -上面的代码打印出以下token ID(`这里原始英文书籍中没有输出打印结果,读者可以自己运行代码查看结果`): +上面的代码打印出以下token ID: +``` +[1, 56, 2, 850, 988, 602, 533, 746, 5, 1126, 596, 5, 1, 67, 7, 38, 851, 1108, 754, 793, 7] +``` 接下来,让我们看看能否通过 decode 方法将这些token ID 转换回文本: