曹学飞, 李济洪, 王瑞波, 牛倩, 王钰. 基于稳健设计的双向长短期记忆神经网络模型的调优方法[J]. 应用概率统计, 2022, 38(3): 317-332. DOI: 10.3969/j.issn.1001-4268.2022.03.001
引用本文: 曹学飞, 李济洪, 王瑞波, 牛倩, 王钰. 基于稳健设计的双向长短期记忆神经网络模型的调优方法[J]. 应用概率统计, 2022, 38(3): 317-332. DOI: 10.3969/j.issn.1001-4268.2022.03.001
CAO Xuefei, LI Jihong, WANG Ruibo, NIU Qian, WANG Yu. A Tuning Method for Bi-directional Long Short-Term Memory Neural Network Model Based on Robust Design[J]. Chinese Journal of Applied Probability and Statistics, 2022, 38(3): 317-332. DOI: 10.3969/j.issn.1001-4268.2022.03.001
Citation: CAO Xuefei, LI Jihong, WANG Ruibo, NIU Qian, WANG Yu. A Tuning Method for Bi-directional Long Short-Term Memory Neural Network Model Based on Robust Design[J]. Chinese Journal of Applied Probability and Statistics, 2022, 38(3): 317-332. DOI: 10.3969/j.issn.1001-4268.2022.03.001

基于稳健设计的双向长短期记忆神经网络模型的调优方法

A Tuning Method for Bi-directional Long Short-Term Memory Neural Network Model Based on Robust Design

  • 摘要: 双向长短期记忆神经网络模型在自然语言处理中广泛使用, 但其调优问题是使用中的难点.本文以自然语言处理中的语义角色识别任务为例, 在双向长短期记忆神经网络模型的调优中, 将4 个候选特征(词、词性、目标词和位置) 和2 个超参数(网络的层数和是否在顶层添加CRF 分类器) 看作稳健设计中的因子, 设置各因子的水平, 进行实验来选择特征和超参数的最优配置组合. 本文在小数据集(6692 条带有语义角色标注信息的例句) 上以3 *2 交叉验证来做完全实验, 以稳健设计的望大特性信噪比为优化目标, 选出了模型的最优配置组合, 并采用因子的方差分析, 定量分析了各因子对模型性能的影响, 使得模型有一定的可解释性. 为了验证本文选出的最优配置组合的优良性, 采用传统方法,在大数据集(约4 万条例句) 上以自然语言处理中常用的标准切分8:1:1, 基于传统的贪心策略调优方法选出最优配置组合, 并与本文方法在测试集进行比较, 验证了本文的调优方法优于传统的调优方法.

     

    Abstract: The bi-directional long short-term memory neural network model is widely used in natural language processing, but hyperparameter tuning of the model is difficult in practice. In this paper, we take the semantic role recognition task as an example, consider four candidate features (word, part of speech, target word and position) and two hyperparameters (the number of layers of the network and whether CRF classifier is used) as factors in robust design, and select the optimal combination of features and hyperparameters by setting levels of each factor and performing experiments. In particular, we perform 32 cross validation on a small datasets to select the optimal configuration combination of the model based on the SNR of robust design. Then, we analyze the influence of each factor on the performance of the model by quantitatively analyze so that the model has a certain degree of interpretability. Moreover, in order to verify the superiority of our tuning method, we use the standard segmentation of natural language processing on a big dataset, adopt the traditional greedy strategy to select the optimal configuration combination, and compare with our method on the test set. The results show that our method is better than the traditional tuning method.

     

/

返回文章
返回