<div dir="ltr">大家好:<div><br></div><div>近日針對申論題有以下三點實驗,可惜皆無法改善成效:</div><div>一、採用范博士提供的在DRCD上預測的supporting evidence重新訓練模型</div><div>二、基於多組base, large模型嘗試logit-level與sentence-level的ensemble</div><div>三、將已抽取句子之TFIDF向量,丟進SVM決定是否要丟掉句子</div><div><br></div><div><br></div><div>鑒於FGC大會對申論題答案的長度限制是100個字元,我做了以下的分析:</div><div>1. 將模型中止抽取的機率閥值設為0 (i.e. 文章的所有句子將依模型預測的機率由高到低排序)</div><div>2. 將排序後的文章截短至指定字元數,並計算ROUGE(recall)視為模型的upperbound</div><div><br></div><div><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">---- 驗證集+測試集 (100個字元) ----
<pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">avg. ROUGE-1(R): 75.52%
avg. ROUGE-2(R): 57.82%
avg. ROUGE-L(R): 61.44%</pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap"><br></pre></pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">---- 驗證集+測試集 (125個字元) ----</pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap"><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap"><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">avg. ROUGE-1(R): 78.03%
avg. ROUGE-2(R): 58.65%
avg. ROUGE-L(R): 61.52%</pre>
---- 驗證集+測試集 (150個字元) ----
<pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">avg. ROUGE-1(R): 81.59%
avg. ROUGE-2(R): 61.51%
avg. ROUGE-L(R): 62.28%</pre>
---- 驗證集+測試集 (175個字元) ----
<pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">avg. ROUGE-1(R): 85.30%
avg. ROUGE-2(R): 64.62%
avg. ROUGE-L(R): 64.31%</pre>
---- 驗證集+測試集 (200個字元) ----
<pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">avg. ROUGE-1(R): 89.59%
avg. ROUGE-2(R): 70.49%
avg. ROUGE-L(R): 68.75%</pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap"><br></pre>---- 驗證集+測試集 (250個字元) ----
<pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">avg. ROUGE-1(R): 90.59%
avg. ROUGE-2(R): 70.95%
avg. ROUGE-L(R): 69.70%</pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap"><br></pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap"><br></pre>綜上,再基於以下兩點數據與肉眼觀察:</pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">(1) FGC文章中每個句子的平均字元長度為15.86</pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">(2) 目前的模組平均抽取 5.31句話 / 88.75個字元</pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap"><br></pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">若要讓目前的模組盡可能完整得回答問題(僅考慮把recall最大化)</pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">會需要把大約模型預測之top-10至top-13的句子給救回來</pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">以目前使用的NN模型架構,改善空間非常有限</pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap"><br></pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">Best regards,</pre><pre style="box-sizing:unset;border:none;margin-top:0px;margin-bottom:0px;padding:0px;overflow:auto;word-break:break-all;white-space:pre-wrap">郭家銍</pre></pre></div><div><br></div></div>