EXtended WordNet

The eXtended WordNet is a project at the University of Texas at Dallas (and funded by the National Science Foundation) that aims to improve WordNet by semantically parsing the glosses, thus making the information contained in these definitions available for automatic knowledge processing systems. It is freely available under a BSD style license. Although it has not been updated since November 2004 (the most recent version is based on WordNet 2.0), it still remains a useful resource.

Database format

The database is available as a set of four XML files - one each for verbs, adverbs, nouns and adjectives. The following information is extracted from the glosses:

Word sense disambiguation
Parse tree
Logic form

As an example, the following information is available for the synset excellent, first-class, fantabulous:

Gloss:

 of the highest quality

Word sense disambiguation:

  <wf pos="IN" >of</wf>
  <wf pos="DT" >the</wf>
  <wf pos="JJS" lemma="highest" quality="normal" wnsn="1" >highest</wf>
  <wf pos="NN" lemma="quality" quality="normal" wnsn="2" >quality</wf>

Parse tree:

 (TOP (S (NP (JJ excellent) )
         (VP (VBZ is)
             (NP (NP (NN something) )
                 (PP (IN of)
                     (NP (DT the) (JJS highest) (NN quality) ) ) ) )
         (. .) ) )

Logic form:

 excellent:JJ(x1) -> of:IN (x1, x2) highest:JJ(x2) quality:NN(x2)

Data quality

Each gloss is first tagged using Brill's tagger. The glosses are then parsed using both Charniak's parser and an in-house Collins' style parser. Each parsed gloss is then assigned a level of quality:

Gold: those that have been manually checked
Silver: those where both parsers have produced the same output
Normal: those where different outputs have been produced—in these situations the output of the in-house parser is used

References

eXtended WordNet homepage

External links

Online eXtended WordNet browser

Page currently not available

0.00

(0 votes)