GloVe (machine learning)

Short description: Algorithm for obtaining vector representations of words

GloVe, coined from Global Vectors, is a model for distributed word representation. The model is an unsupervised learning algorithm for obtaining vector representations for words. This is achieved by mapping words into a meaningful space where the distance between words is related to semantic similarity.^[1] Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space. It is developed as an open-source project at Stanford^[2] and was launched in 2014. As log-bilinear regression model for unsupervised learning of word representations, it combines the features of two model families, namely the global matrix factorization and local context window methods.^[3]

Applications

GloVe can be used to find relations between words like synonyms, company-product relations, zip codes and cities, etc. However, the unsupervised learning algorithm is not effective in identifying homographs, i.e., words with the same spelling and different meanings. This is as the unsupervised learning algorithm calculates a single set of vectors for words with the same morphological structure.^[4]The algorithm is also used by the SpaCy library to build semantic word embedding features, while computing the top list words that match with distance measures such as cosine similarity and Euclidean distance approach.^[5] GloVe was also used as the word representation framework for the online and offline systems designed to detect psychological distress in patient interviews.^[1]

References

↑ ^1.0 ^1.1 Abad, Alberto; Ortega, Alfonso; Teixeira, António; Mateo, Carmen; Hinarejos, Carlos; Perdigão, Fernando; Batista, Fernando; Mamede, Nuno (2016) (in en). Advances in Speech and Language Technologies for Iberian Languages: Third International Conference, IberSPEECH 2016, Lisbon, Portugal, November 23-25, 2016, Proceedings. Cham: Springer. pp. 165. ISBN 9783319491691.
↑ GloVe: Global Vectors for Word Representation (pdf) "We use our insights to construct a new model for word representation which we call GloVe, for Global Vectors, because the global corpus statistics are captured directly by the model."
↑ Kalajdziski, Slobodan (2018) (in en). ICT Innovations 2018. Engineering and Life Sciences. Cham: Springer. pp. 220. ISBN 9783030008246.
↑ Wenig, Phillip (2019). "Creation of Sentence Embeddings Based on Topical Word Representations: An approach towards universal language understanding". Towards Data Science.
↑ Singh, Mayank; Gupta, P. K.; Tyagi, Vipin; Flusser, Jan; Ören, Tuncer I. (2018). Advances in Computing and Data Sciences: Second International Conference, ICACDS 2018, Dehradun, India, April 20-21, 2018, Revised Selected Papers. Singapore: Springer. pp. 171. ISBN 9789811318122.

External links

0.00

(0 votes)

[:0-1] 1.0 ^1.1 Abad, Alberto; Ortega, Alfonso; Teixeira, António; Mateo, Carmen; Hinarejos, Carlos; Perdigão, Fernando; Batista, Fernando; Mamede, Nuno (2016) (in en). Advances in Speech and Language Technologies for Iberian Languages: Third International Conference, IberSPEECH 2016, Lisbon, Portugal, November 23-25, 2016, Proceedings. Cham: Springer. pp. 165. ISBN 9783319491691.

[2] GloVe: Global Vectors for Word Representation (pdf) "We use our insights to construct a new model for word representation which we call GloVe, for Global Vectors, because the global corpus statistics are captured directly by the model."

[3] Kalajdziski, Slobodan (2018) (in en). ICT Innovations 2018. Engineering and Life Sciences. Cham: Springer. pp. 220. ISBN 9783030008246.

[4] Wenig, Phillip (2019). "Creation of Sentence Embeddings Based on Topical Word Representations: An approach towards universal language understanding". Towards Data Science.

[5] Singh, Mayank; Gupta, P. K.; Tyagi, Vipin; Flusser, Jan; Ören, Tuncer I. (2018). Advances in Computing and Data Sciences: Second International Conference, ICACDS 2018, Dehradun, India, April 20-21, 2018, Revised Selected Papers. Singapore: Springer. pp. 171. ISBN 9789811318122.

[1]

[2]

[3]

[4]

[5]

v t e Natural language processing
General terms	Natural language understanding Text corpus Speech corpus Stopwords Bag-of-words AI-complete n-gram (Bigram, Trigram)
Text analysis	Text segmentation Part-of-speech tagging Text chunking Compound term processing Collocation extraction Stemming Lemmatisation Named-entity recognition Coreference resolution Sentiment analysis Concept mining Parsing Word-sense disambiguation Ontology learning Terminology extraction Textual entailment Truecasing
Automatic summarization	Multi-document summarization Sentence extraction Text simplification
Machine translation	Computer-assisted Example-based Rule-based Neural
Automatic identification and data capture	Speech recognition Speech synthesis Optical character recognition Natural language generation
Topic model	Pachinko allocation Latent Dirichlet allocation Latent semantic analysis
Computer-assisted reviewing	Automated essay scoring Concordancer Grammar checker Predictive text Spell checker Syntax guessing
Natural language user interface	Automated online assistant Chatbot Interactive fiction Question answering Voice user interface

GloVe (machine learning)

Contents

Applications

See also

References

External links