OMW Version 1

This page provides access to wordnets in a variety of languages, all linked to the Princeton Wordnet of English (PWN). The individual wordnets have been made by many different projects and vary greatly in size and accuracy. This page has (i) extracted and normalized the data, (ii) linked it to Princeton WordNet 3.0 and (iii) put it in one place. This page only includes those with a license that allows redistribution (Bond and Paik, 2012).

Search in OMW 1.3 (human curated)

We extended this with automatically extracted data for over 150 languages from Wiktionary and the ‎Unicode Common Locale Data Repository (Bond and Foster, 2013).

Search in Extended OMW 1.3 (with Wiktionary and UCLDR data)

Data and Code

Summary of Wordnets

28 Available Wordnets
Wordnet Lang Synsets Words Senses Core Licence
Albanet als 4,675 5,988 9,599 31% CC BY 3.0
Arabic WordNet (AWN v2) arb 9,916 17,785 37,335 47% CC BY SA 3.0
BulTreeBank Wordnet (BTB-WN) bul 4,959 6,720 8,936 99% CC BY 3.0
Chinese Open Wordnet cmn 42,312 61,533 79,809 100% wordnet
Chinese Wordnet (Taiwan) cmn 4,913 3,206 8,069 28% wordnet
DanNet dan 4,476 4,468 5,859 81% wordnet
Greek Wordnet ell 18,049 18,227 24,106 57% Apache 2.0
Princeton WordNet eng 117,659 148,730 206,978 100% wordnet
Persian Wordnet fas 17,759 17,560 30,461 41% Free to use
FinnWordNet fin 116,763 129,839 189,227 100% CC BY 3.0
WOLF (Wordnet Libre du Français) fra 59,091 55,373 102,671 92% CeCILL-C
Hebrew Wordnet heb 5,448 5,325 6,872 27% wordnet
Croation Wordnet hrv 23,120 29,008 47,900 100% CC BY 3.0
MultiWordNet ita 35,001 41,855 63,133 83% CC BY 3.0
Japanese Wordnet jpn 57,184 91,964 158,069 95% wordnet
Multilingual Central Repository cat 45,826 46,531 70,622 81% CC BY 3.0
Multilingual Central Repository eus 29,413 26,240 48,934 71% CC BY 3.0
Multilingual Central Repository glg 19,312 23,124 27,138 36% CC BY 3.0
Multilingual Central Repository spa 38,512 36,681 57,764 76% CC BY 3.0
Wordnet Bahasa ind 38,085 36,954 106,688 94% MIT
Wordnet Bahasa zsm 36,911 33,932 105,028 96% MIT
Norwegian Wordnet nno 3,671 3,387 4,762 66% wordnet
Norwegian Wordnet nob 4,455 4,186 5,586 81% wordnet
plWordNet pol 33,826 45,387 52,378 54% wordnet
OpenWN-PT por 43,895 54,071 74,012 84% CC BY SA
sloWNet slv 42,583 40,233 70,947 86% CC BY SA 3.0
Swedish (SALDO) swe 6,796 5,824 6,904 99% CC BY 3.0
Thai Wordnet tha 73,350 82,504 95,517 81% wordnet

Language codes are linked to the English Wikipedia.

Documentation and Notes

Core

Synsets marked with ✪ are in the semi-automatically compiled list of 5000 core word senses in Princeton WordNet (approximately the 5000 most frequently used word senses). They are marked with ✪ in the interface. The original list is here from http://wordnetcode.princeton.edu/standoff-files/core-wordnet.txt (Boyd-Graber et al., 2008). Our version (converted to use collaborative interlingual index).

The wordnets are linked to the Suggested Upper Merged Ontology (Sumo: Niles and Pease, 2001; Pease, 2011); the TempoWordNet (Dias et al., 2014); the Multilingual, layered sentiment lexicons (ML-SentiCon: Cruz et al., 2014); and SentiWordNet3.0 (Baccianella et al., 2010).

Mapping between wordnet versions was done using the mappings from TALP at UPC (Daudé et al. 2000).

Formats

Tab files

The wn-data-*.tab files are tab separated files of synset-lemma pairs; or synset-subid-definition/example

# name␉lang␉url␉license
offset-pos␉lang:lemma␉word
offset-pos␉lang:def␉sid␉definition
offset-pos␉lang:exe␉sid␉example
...
name the name of the project
lang the iso 3 letter code for the name
url the url of the project
license a short name for the license
offset the Princeton WordNet 3.0 offset 8 digit offset
pos one of [a,v,n,r] (we treat 's' as 'a')
lemma the lemma (word separator normalized to ' ')
sid a the sub id of the definition/example (starting from 0)

Example:

# Wordnet Bahasa	ind	http://wn-msa.sourceforge.net/	MIT 
00019613-n	ind:def 0 masalah fisik yang nyata
00019613-n	ind:lemma	inti
00019613-n	ind:lemma	unsur
11407591-n	ind:def	0	Novelis dan kritikus Perancis
11407591-n	ind:def	1	pembela Dreyfus
11407591-n	ind:lemma	Emile Zola
11407591-n	ind:lemma	Zola

For this data to be really useful you need to combine it with the synset relations from the Princeton wordnet.

Known Problems

Notes

There are some places where I made changes to harmonize different wordnets:

References

Francis Bond and Kyonghee Paik (2012)
A survey of wordnets and their licenses In Proceedings of the 6th Global WordNet Conference (GWC 2012). Matsue. 64–71
Francis Bond and Ryan Foster (2013)
Linking and extending an open multilingual wordnet. In 51st Annual Meeting of the Association for Computational Linguistics: ACL-2013. Sofia. 1352–1362
Boyd-Graber, J., Fellbaum, C., Osherson, D., and Schapire, R. (2006): core
Adding dense, weighted connections to WordNet. In: Proceedings of the Third Global WordNet Meeting, Jeju Island, Korea, January 2006
Baccianella, Andrea Esuli Stefano and Sebastiani, Fabrizio, (2010): sentiwn
SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining., Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) , Valletta, Malta, 2010
Cruz, Fermín L., José A. Troyano, Beatriz Pontes, F. Javier Ortega, (2014): ml-senticon
Building layered, multilingual sentiment lexicons at synset and lemma levels, Expert Systems with Applications , 2014
Jordi Daudé, Lluís Padró and German Rigau (2000): mapp
Mapping WordNets Using Structural Information. 38th Annual Meeting of the Association for Computational Linguistics (ACL'2000), Hong Kong
Adam Pease (2011): sumo
Ontology: A Practical Guide. Articulate Software Press, Angwin, CA. ISBN 978-1-889455-10-5.
Niles, I and Adam Pease (2001): sumo
Toward a Standard Upper Ontology. In Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), Chris Welty and Barry Smith, eds.
Gaël Dias, Mohammed Hasanuzzaman, Stéphane Ferrari, Yann Mathet (2014): tempo
TempoWordNet for Sentence Time Tagging. Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion pages 833–838, Switzerland

Maintainer: Francis Bond
Contributors: Francis Bond, Luís Morgado da Costa, Michael Goodman and all the wordnet projects.

Source code hosted at https://github.com/omwn/omwn.github.io.