R packages by emilhvitfeldt

yardstick - Tidy Characterizations of Model Performance

Tidy tools for quantifying how well model fits to a data set such as confusion matrices, class probability curve summaries, and regression metrics (e.g., RMSE).

Last updated 18 days ago

15.24 score 383 stars 59 dependents 2.2k scripts 36k downloads

paletteer - Comprehensive Collection of Color Palettes

The choices of color palettes in R can be quite overwhelming with palettes spread over many packages with many different API's. This packages aims to collect all color palettes across the R ecosystem under the same package with a streamlined API.

Last updated 8 months ago

color-palettepalettes

14.01 score 952 stars 22 dependents 7.0k scripts 192k downloads

prismatic - Color Manipulation Tools

Manipulate and visualize colors in a intuitive, low-dependency and functional way.

Last updated 3 months ago

colorcolor-manipulationcolour

11.65 score 138 stars 29 dependents 428 scripts 173k downloads

tidypredict - Run Predictions Inside the Database

It parses a fitted 'R' model object, and returns a formula in 'Tidy Eval' code that calculates the predictions. It works with several databases back-ends because it leverages 'dplyr' and 'dbplyr' for the final 'SQL' translation of the algorithm. It currently supports lm(), glm(), randomForest(), ranger(), earth(), xgb.Booster.complete(), cubist(), and ctree() models.

Last updated 2 months ago

dbplyrdplyrpurrrrlang

11.03 score 261 stars 2 dependents 241 scripts 1.5k downloads

sparsevctrs - Sparse Vectors for Use in Data Frames

Provides sparse vectors powered by ALTREP (Alternative Representations for R Objects) that behave like regular vectors, and can thus be used in data frames. Also provides tools to convert between sparse matrices and data frames with sparse columns and functions to interact with sparse vectors.

Last updated 4 days ago

11.02 score 14 stars 212 dependents 21 scripts 70k downloads

lime - Local Interpretable Model-Agnostic Explanations

When building complex models, it is often difficult to explain why the model should be trusted. While global measures such as accuracy are useful, they cannot be used for explaining why a model made a specific prediction. 'lime' (a port of the 'lime' 'Python' package) is a method for explaining the outcome of black box models by fitting a local model around the point in question an perturbations of this point. The approach is described in more detail in the article by Ribeiro et al. (2016) <arXiv:1602.04938>.

Last updated 3 years ago

caretmodel-checkingmodel-evaluationmodelingcpp

10.98 score 486 stars 1 dependents 732 scripts 1.5k downloads

textrecipes - Extra 'Recipes' for Text Processing

Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.

Last updated 26 days ago

10.76 score 160 stars 1 dependents 992 scripts 729 downloads

textdata - Download and Load Various Text Datasets

Provides a framework to download, parse, and store text datasets on the disk and load them when needed. Includes various sentiment lexicons and labeled text data sets for classification and analysis.

Last updated 9 months ago

text-datasets

9.76 score 75 stars 1 dependents 1.3k scripts 4.0k downloads

themis - Extra Recipes Steps for Dealing with Unbalanced Data

A dataset with an uneven number of cases in each class is said to be unbalanced. Many models produce a subpar performance on unbalanced datasets. A dataset can be balanced by increasing the number of minority cases using SMOTE 2011 <doi:10.48550/arXiv.1106.1813>, BorderlineSMOTE 2005 <doi:10.1007/11538059_91> and ADASYN 2008 <https://ieeexplore.ieee.org/document/4633969>. Or by decreasing the number of majority cases using NearMiss 2003 <https://www.site.uottawa.ca/~nat/Workshop2003/jzhang.pdf> or Tomek link removal 1976 <https://ieeexplore.ieee.org/document/4309452>.

Last updated 27 days ago

9.73 score 143 stars 1 dependents 1.1k scripts 5.0k downloads

rules - Model Wrappers for Rule-Based Models

Bindings for additional models for use with the 'parsnip' package. Models include prediction rule ensembles (Friedman and Popescu, 2008) <doi:10.1214/07-AOAS148>, C5.0 rules (Quinlan, 1992 ISBN: 1558602380), and Cubist (Kuhn and Johnson, 2013) <doi:10.1007/978-1-4614-6849-3>.

Last updated 4 months ago

9.47 score 40 stars 1 dependents 20k scripts 986 downloads

embed - Extra Recipes for Encoding Predictors

Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.

Last updated 20 days ago

9.30 score 142 stars 1.1k scripts 1.5k downloads

discrim - Model Wrappers for Discriminant Analysis

Bindings for additional classification models for use with the 'parsnip' package. Models include flavors of discriminant analysis, such as linear (Fisher (1936) <doi:10.1111/j.1469-1809.1936.tb02137.x>), regularized (Friedman (1989) <doi:10.1080/01621459.1989.10478752>), and flexible (Hastie, Tibshirani, and Buja (1994) <doi:10.1080/01621459.1994.10476866>), as well as naive Bayes classifiers (Hand and Yu (2007) <doi:10.1111/j.1751-5823.2001.tb00465.x>).

Last updated 4 months ago

8.02 score 28 stars 1 dependents 992 scripts 955 downloads

emoji - Data and Function to Work with Emojis

Contains data about emojis with relevant metadata, and functions to work with emojis when they are in strings.

Last updated 4 months ago

7.94 score 28 stars 3 dependents 306 scripts 1.1k downloads

ggpage - Creates Page Layout Visualizations

Facilitates the creation of page layout visualizations in which words are represented as rectangles with sizes relating to the length of the words. Which then is divided in lines and pages for easy overview of up to quite large texts.

Last updated 6 years ago

data-visualizationdatavisualizationdatavizggplot2

7.53 score 340 stars 66 scripts 236 downloads

tidyclust - A Common API to Clustering

A common interface to specifying clustering models, in the same style as 'parsnip'. Creates unified interface across different functions and computational engines.

Last updated 22 days ago

7.36 score 110 stars 139 scripts 861 downloads

modelenv - Provide Tools to Register Models for Use in 'tidymodels'

An developer focused, low dependency package in 'tidymodels' that provides functions to register how models are to be used. Functions to register models are complimented with accessor functions to retrieve registered model information to aid in model fitting and error handling.

Last updated 4 months ago

7.08 score 4 stars 43 dependents 1 scripts 23k downloads

orbital - Predict with 'tidymodels' Workflows in Databases

Turn 'tidymodels' workflows into objects containing the sufficient sequential equations to perform predictions. These smaller objects allow for low dependency prediction locally or directly in databases.

Last updated 2 months ago

6.22 score 25 stars 11 scripts 353 downloads

fastTextR - An Interface to the 'fastText' Library

An interface to the 'fastText' library <https://github.com/facebookresearch/fastText>. The package can be used for text classification and to learn word vectors. An example how to use 'fastTextR' can be found in the 'README' file.

Last updated 1 years ago

cpp

5.50 score 4 stars 2 dependents 44 scripts 479 downloads

friends - The Entire Transcript from Friends in Tidy Format

The complete scripts from the American sitcom Friends in tibble format. Use this package to practice data wrangling, text analysis and network analysis.

Last updated 3 years ago

5.03 score 63 stars 34 scripts 197 downloads

hcandersenr - H.C. Andersens Fairy Tales

Texts for H.C. Andersens fairy tales, ready for text analysis. Fairy tales in German, Danish, English, Spanish and French.

Last updated 5 years ago

andersens-fairy-talestext-mining

4.62 score 10 stars 83 scripts 213 downloads

modeldatatoo - More Data Sets Useful for Modeling Examples

More data sets used for demonstrating or testing model-related packages are contained in this package. The data sets are downloaded and cached, allowing for more and bigger data sets.

Last updated 11 months ago

4.55 score 7 stars 34 scripts 201 downloads

yardstick - Tidy Characterizations of Model Performance

paletteer - Comprehensive Collection of Color Palettes

prismatic - Color Manipulation Tools

tidypredict - Run Predictions Inside the Database

sparsevctrs - Sparse Vectors for Use in Data Frames

lime - Local Interpretable Model-Agnostic Explanations

textrecipes - Extra 'Recipes' for Text Processing

textdata - Download and Load Various Text Datasets

themis - Extra Recipes Steps for Dealing with Unbalanced Data

rules - Model Wrappers for Rule-Based Models

embed - Extra Recipes for Encoding Predictors

discrim - Model Wrappers for Discriminant Analysis

emoji - Data and Function to Work with Emojis

ggpage - Creates Page Layout Visualizations

tidyclust - A Common API to Clustering

modelenv - Provide Tools to Register Models for Use in 'tidymodels'

orbital - Predict with 'tidymodels' Workflows in Databases

fastTextR - An Interface to the 'fastText' Library

friends - The Entire Transcript from Friends in Tidy Format

hcandersenr - H.C. Andersens Fairy Tales

modeldatatoo - More Data Sets Useful for Modeling Examples

walmartAPI - Walmart Open API Wrapper

extrasteps - More Miscellaneous Steps for the 'recipes' Package

wordsalad - Provide Tools to Extract and Analyze Word Vectors

methcon5 - Identify and Rank CpG DNA Methylation Conservation Along the Human Genome