干货|OpenVINO实现文本情感分析模型部署

openlab_4276841a 更新于 1年前

1. 数据分析

simplifyweibo_4_moods 说明
  • 下载地址: 百度网盘

(https://pan.baidu.com/s/16c93E5x373nsGozyWevITg#list/path=%2F)
  • 数据概览: 36 万多条,带情感标注文本,包含 4 种情感,其中喜悦约 20 万条,愤怒、厌恶、低落各约 5 万条

  • 推荐实验: 情感/观点/评论 倾向性分析

  • 数据来源: 文本

  • 原数据集: 微博情感分析数据集,网上搜集,具体作者、来源不详 ·加工处理:

- 将原来的 4 份文档,整合成 1 份 csv 文件 
- 原始语料进行了分词处理,我们重新将其还原为未分词的语料编码统一为 UTF-8
- 去重,去除非文字的符号
- 平衡类别数据
data_path = './data/simplifyweibo_4_moods.csv'pd_all = pd.read_csv(data_path)moods = {0: '喜悦', 1: '愤怒', 2: '厌恶', 3: '低落'}print('数目(总体):%d' % pd_all.shape[0])for label, mood in moods.items():print('数目({}):{}'.format(mood, pd_all[pd_all.label==label].shape[0]))
数目(总体):361744数目(喜悦):199496数目(愤怒):51714数目(厌恶):55267数目(低落):55267
pd_all.sample(10)
label review7771 0 回复侄女叔叔是不会让人失望的音乐和所有的宗教一样,也是***,一旦加入音乐这个宗教,在这个真... 46705 0 我要辛普森那個 米开朗基罗名作《创造亚当》,谁创造了马里奥、辛普森爸爸和梅西?283740 2 为什么还不下雨?268784 2 【阿姨看球好忙活】中场休息时阿姨看了会儿C 罗女友内衣**。。。身材啊~ 火爆啊~ 犯罪啊~... 236309 1 差点没把我饿死在回家的路上硬在台州三区一游~ ········262247 2 我我我我...受到了惊吓!!克罗地亚首都萨格勒布郊区的两名渔夫日前捕获一条长达2.5米、重达... 50645 0 哈哈太厉害了哈O “都回去吧,不是咱村的!”124971 0 【每日一拍】回来了,迎接偶的是大肉包子,肉笼,还有,还有礼物~ ~ ~ 真开心啊~ ~ 哈哈... 202320 1 极尽奢华~ 这款Alexander McQueen 的Britannia 手抓包简直是美到骨...325942 3 回复“人有今生、前生、来生,我前生是一个和尚”,他最终信的是佛。。。。


2. PaddleNLP预训练模型加载与模型 finetune

本示例展示了 ERNIE (Enhanced Representation through Knowledge Integration) 代表的预训练模型如 Finetune 完成中文文本分类任务。
pretrained_models/ ├── deploy # 部署│ └── python│ └── predict.py # python预测部署示例├── export_model.py # 动态图参数导出静态图参数脚本├── predict.py # 预测脚本└── train.py # 训练评估脚本

值得注意的是本次是基于二分类的模型微调4分类的任务,因此‘num_classes=4’
model =paddlenlp.transformers.ErnieForSequenceClassification.from_pretrained('ernie-1.0', num_classes=4)tokenizer = paddlenlp.transformers.ErnieTokenizer.from_pretrained('ernie-1.0')

部分训练过程,因为存在非文字的符号,以及从二分类迁移到4分类,且训练轮数较少,acc才达到40%
$ python pretrained_models/train.py...lobal step 26500, epoch: 5, batch: 4672/5457, loss: 1.22440, accu: 0.48486,speed: 0.65 step/sglobal step 26600, epoch: 5, batch: 4772/5457, loss: 1.15956, accu: 0.48500,speed: 0.65 step/sglobal step 26700, epoch: 5, batch: 4872/5457, loss: 0.95659, accu: 0.48507,speed: 0.65 step/sglobal step 26800, epoch: 5, batch: 4972/5457, loss: 0.94369, accu: 0.48517,speed: 0.64 step/sglobal step 26900, epoch: 5, batch: 5072/5457, loss: 1.23624, accu: 0.48518,speed: 0.64 step/sglobal step 27000, epoch: 5, batch: 5172/5457, loss: 0.95331, accu: 0.48488,speed: 0.64 step/sglobal step 27100, epoch: 5, batch: 5272/5457, loss: 1.27223, accu: 0.48490,speed: 0.66 step/sglobal step 27200, epoch: 5, batch: 5372/5457, loss: 0.90904, accu: 0.48493,speed: 0.65 step/seval loss: 1.25221, accu: 0.41496test result...eval loss: 1.25582, accu: 0.40721

模型训练好后,使用paddleNLP的官方脚本导出模型
部分主要代码:
label_map = {0: '喜悦', 1: '愤怒', 2: '厌恶', 3: '低落'}# 加载训练时选定的预训练模型model = ppnlp.transformers.ErnieForSequenceClassification.from_pretrained('ernie-1.0', num_class=4)if args.params_path and os.path.isfile(args.params_path):state_dict = paddle.load(args.params_path)model.set_dict(state_dict)print("Loaded parameters from %s" % args.params_path)model.eval()# Convert to static graph with specific input descriptionmodel = paddle.jit.to_static(model,input_spec=[paddle.static.InputSpec(shape=[None, None], dtype="int64"),  # input_idspaddle.static.InputSpec(shape=[None, None], dtype="int64")  # segment_ids])# Save in static graph model.paddle.jit.save(model, args.output_path)

完整代码
python pretrained_models/export_model.py --params_path./checkpoint/model_27285/model_state.pdparams  --output_path=./static_graph_param

测试
python pretrained_models/deploy/python/predict.py --model_file=./static_graph_param.pdmodel --params_file=./static_graph_param.pdiparams
ompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.Data: 减肥有所成效,继续努力 Label: 喜悦Data: 找到 网络电视了啦 英超德甲 在线看不会卡 Label: 喜悦Data: 真的很难过 发文问大家 Label: 喜悦 预测时间: 171.25 ms

3. 借助ONNX将Paddle的模型转为支持OpenVINO的格式

环境准备
pip install paddle2onnx onnx onnxruntime

注意使用静态图的模型,否则会报错
paddle2onnx --model_dir ./checkpoint/ --model_filename static_graph_param.pdmodel--params_filename static_graph_param.pdiparams --save_file model.onnx --opset_version 11Traceback (most recent call last):  File "/home/l/anaconda3/envs/env_paddle/bin/paddle2onnx", line 8, in <module>    sys.exit(main())  File "/home/l/anaconda3/envs/env_paddle/lib/python3.7/site-packages/paddle2onnx/command.py", line 155, in main    operator_export_type=operator_export_type)  File "/home/l/anaconda3/envs/env_paddle/lib/python3.7/site-packages/paddle2onnx/command.py", line 113, in program2onnx    params_filename=params_filename)  File "/home/l/anaconda3/envs/env_paddle/lib/python3.7/site-packages/decorator.py", line 232, in fun    return caller(func, *(extras + args), **kw)  File "/home/l/anaconda3/envs/env_paddle/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__    return wrapped_func(*args, **kwargs)  File "/home/l/anaconda3/envs/env_paddle/lib/python3.7/site-packages/paddle/fluid/framework.py", line 236, in __impl__    return func(*args, **kwargs)  File "/home/l/anaconda3/envs/env_paddle/lib/python3.7/site-packages/paddle/fluid/io.py", line 1526, in load_inference_model    with open(model_filename, "rb") as f:FileNotFoundError: [Errno 2] No such file or directory:'checkpoint/static_graph_param.pdmodel'

Paddle2ONNX静态图模型导出转为ONNX
paddle2onnx --model_dir . --model_filename static_graph_params.pdmodel --params_filename static_graph_params.pdiparams --save_file model.onnx --opset_version 11

输出
model.onnx

使用ONNX加载模型
import onnximport onnxruntime as ort# 请注意,预测模型在此替换onnx_model = onnx.load_model(model_file)predictor = ort.InferenceSession(onnx_model.SerializeToString())

借助OpenVINO加速推理
from openvino.inference_engine import IENetwork, IECore, ExecutableNetwork# 装载模型ie = IECore()net = ie.read_network(self.model_file)# reshape based on the inputnet.reshape({'input_ids': input_ids.shape,'token_type_ids': segment_ids.shape})# 创建推理请求predictor = ie.load_network(net, 'CPU')assert isinstance(predictor, ExecutableNetwork)output = predictor.infer({'input_ids':input_ids, 'token_type_ids': segment_ids})
0个评论