Open Source Dutch WordNet is an open source version of Cornetto (Vossen et al., 2013). Cornetto is currently not distributed as open source, because a large portion of the database originates from the commercial publisher Van Dale.
We use English WordNet 3.0 (Miller, 1995; Fellbaum, 1998) as our basis. This means that we replace the Van Dale synsets and internal semantic relations by Wordnet synsets and internal semantic relations.
Demo of Open Source Dutch WordNet. Release first version of the Open Dutch Wordnet (ODWN): December 02, 2014. By Marten Postma and Piek Vossen.
We replace the English content in English WordNet in two ways:
1. When there exists an external semantic relation (ESR) between a Cornetto synset and a WordNet synset, all synonyms from the Cornetto synset are inserted into the WordNet synset. However, the semantic relations were first filtered before this technique was applied. Four students manually checked 12966 relations, of which 6575 were removed. Afterwards, the unchecked semantic relations were filtered using a decision tree algorithm that used the manual inspection as training. This resulted in a removal of 32258 ESRs (with an F-score of 0.80, as evaluated on the manual set)
2. Using open source resources (Wikipedia (Wikipedia, 2014;Foundation, 2014a), Wiktionary (Foundation, 2014b), Google Translate (Google, 4 2014), Babelnet (Navigli 2010)), the English synonyms in English WordNet are translated into Dutch.
Open Source Dutch WordNet contains 116992 synsets, of which 95356 originate from WordNet 3.0 and 21636 synsets are new synsets.
The number of English synsets without dutch synonyms is 60743, which means that 34613 WordNet 3.0 synsets have been filled with at least one Dutch synonym.