--- title: fastTextR - Word representations output: html_document vignette: > %\VignetteIndexEntry{Word_representations} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- **fastTextR** is an **R** interface to the [fastText](https://github.com/facebookresearch/fastText) library. It can be used to **word representation learning** *(Bojanowski et al., 2016)* and **supervised text classification** *(Joulin et al., 2016)*. Particularly the advantage of **fastText** to other software is that, it was designed for biggish data. The following example is based on the examples provided in the **fastText** library, the example shows how to use **fastTextR** for word representation. For more informations about word representations can be found at the [fastText homepage](https://fasttext.cc/docs/en/unsupervised-tutorial.html). ```{r, eval=FALSE} library("fastTextR") ``` ## Load pretrained model The training of these models can be quite time consuming therefore pre-trained models are a good option. ```{r, eval=FALSE} model <- ft_load("cc.en.300.bin") ``` ## Printing word vectors ```{r, eval=FALSE} ft_word_vectors(model, c("asparagus", "pidgey", "yellow"))[,1:5] ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## asparagus 0.0292057190 -0.0114405714 -0.003201437 0.03087331 0.127229080 ## pidgey 0.0452978685 0.0090015158 0.067562237 0.11123407 -0.008441916 ## yellow 0.0007776691 -0.0001886144 0.001824494 0.03869999 0.036413591 ``` ## Printing sentence vectors ```{r, eval=FALSE} ft_sentence_vectors(model, c("Poets have been mysteriously silent on the subject of cheese", "Who did not let the gorilla into the ballet"))[,1:5] ``` ``` ??? ``` ## Nearest neighbor queries ```{r, eval=FALSE} ft_nearest_neighbors(model, 'asparagus', k = 5L) ``` ``` ## aspargus broccolini artichokes asparagus. asparagas ## 0.7316202 0.6995656 0.6930545 0.6915916 0.6911229 ``` ## Word analogies ```{r, eval=FALSE} ft_analogies(model, c("berlin", "germany", "france")) ``` ``` ## paris france. avignon montpellier paris. ## 0.6831182 0.6408537 0.6288283 0.6138449 0.6059716 ## rennes london Paris. toulon montparnasse ## 0.5884554 0.5832924 0.5743204 0.5727922 0.5715630 ``` ## References [1] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, [*Enriching Word Vectors with Subword Information*](https://arxiv.org/abs/1607.04606) ``` @article{bojanowski2016enriching, title={Enriching Word Vectors with Subword Information}, author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas}, journal={arXiv preprint arXiv:1607.04606}, year={2016} } ``` [2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, [*Bag of Tricks for Efficient Text Classification*](https://arxiv.org/abs/1607.01759) ``` @article{joulin2016bag, title={Bag of Tricks for Efficient Text Classification}, author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas}, journal={arXiv preprint arXiv:1607.01759}, year={2016} } ```