Page 71 - My FlipBook
P. 71
Brochure 2020

Early in the second year of the project (2020), we released However, commonsense knowledge bases (such as
publicly a commonsense word analogy dataset and WordNet and ConceptNet) have long annotated relations
published our research results in LREC 2020. Commonsense in Chinese. E-HowNet currently annotates 88,000 Chinese
reasoning is fundamental for natural language agents words with their structured definitions and English
to generalize inferences beyond their training corpora. translations. We investigated the extraction of accurate
Although the Natural Language Inference (NLI) task commonsense analogies from our commonsense
has proven a good pre-training objective for sentence representation model E-HowNet, resulting in an
representations, the commonsense coverage is limited algorithm (CA-EHN) for extracting accurate analogies
and most models are still end-to-end and heavily reliant from E-HowNet with refinements from linguists. CA-EHN
on word representations to provide background world is the first commonsense analogy dataset containing
knowledge. Therefore, we propose to model commonsense 90,505 analogies covering 5,656 words and 763 relations.
knowledge down to word-level analogical reasoning. In Experimental analysis demonstrates that embedding more
this regard, existing analogy benchmarks are poor. Take commonsense knowledge is useful and that CA-EHN can
for example Chinese Analogy (CA); the simplified Chinese test this aspect of word embedding. Some examples of CA-
dataset CA8 and the traditional Chinese dataset CA Google- EHN are shown in Figure 4.
translated from English only contain dozens of relations,
most of which are either morphological (e.g., a shared
pre x) or about named entities (such as capital-country).

Figure 4 : CA-EHN (word:word=word:synset).

69
   66   67   68   69   70   71   72   73   74   75   76