
broom - Convert Statistical Objects into Tidy Tibbles
Summarizes key information about statistical objects in tidy tibbles. This makes it easy to report results, create plots and consistently work with large numbers of models at once. Broom provides three verbs that each provide different types of information about a model. tidy() summarizes information about model components such as coefficients of a regression. glance() reports information about an entire model, such as goodness of fit measures like AIC and BIC. augment() adds information about individual observations to a dataset, such as fitted values or influence measures.
Last updated
modelingtidy-data
21.90 score 1.5k stars 1.6k dependents 61k scripts 735k downloads
yardstick - Tidy Characterizations of Model Performance
Tidy tools for quantifying how well model fits to a data set such as confusion matrices, class probability curve summaries, and regression metrics (e.g., RMSE).
Last updated
15.96 score 401 stars 69 dependents 3.3k scripts 56k downloads
paletteer - Comprehensive Collection of Color Palettes
The choices of color palettes in R can be quite overwhelming with palettes spread over many packages with many different API's. This packages aims to collect all color palettes across the R ecosystem under the same package with a streamlined API.
Last updated
color-palettepalettes
13.86 score 1.0k stars 40 dependents 12k scripts 64k downloads
sparsevctrs - Sparse Vectors for Use in Data Frames
Provides sparse vectors powered by ALTREP (Alternative Representations for R Objects) that behave like regular vectors, and can thus be used in data frames. Also provides tools to convert between sparse matrices and data frames with sparse columns and functions to interact with sparse vectors.
Last updated
12.05 score 26 stars 489 dependents 30 scripts 165k downloads
tidypredict - Run Predictions Inside the Database
It parses a fitted 'R' model object, and returns a formula in 'Tidy Eval' code that calculates the predictions. It works with several databases back-ends because it leverages 'dplyr' and 'dbplyr' for the final 'SQL' translation of the algorithm. It currently supports lm(), glm(), randomForest(), ranger(), rpart(), earth(), xgb.Booster.complete(), lgb.Booster(), catboost.Model(), cubist(), and ctree() models.
Last updated
dbplyrdplyrpurrrrlang
12.05 score 263 stars 2 dependents 294 scripts 1.4k downloads
lime - Local Interpretable Model-Agnostic Explanations
When building complex models, it is often difficult to explain why the model should be trusted. While global measures such as accuracy are useful, they cannot be used for explaining why a model made a specific prediction. 'lime' (a port of the 'lime' 'Python' package) is a method for explaining the outcome of black box models by fitting a local model around the point in question an perturbations of this point. The approach is described in more detail in the article by Ribeiro et al. (2016) <doi:10.48550/arXiv.1602.04938>.
Last updated
caretmodel-checkingmodel-evaluationmodelingcpp
11.99 score 492 stars 3 dependents 860 scripts 4.3k downloads
bonsai - Model Wrappers for Tree-Based Models
Bindings for additional tree-based model engines for use with the 'parsnip' package. Models include gradient boosted decision trees with 'LightGBM' (Ke et al, 2017.), conditional inference trees and conditional random forests with 'partykit' (Hothorn and Zeileis, 2015. and Hothorn et al, 2006. <doi:10.1198/106186006X133933>), and accelerated oblique random forests with 'aorsf' (Jaeger et al, 2022 <doi:10.5281/zenodo.7116854>).
Last updated
11.06 score 54 stars 1 dependents 24k scripts 1.4k downloads
prismatic - Color Manipulation Tools
Manipulate and visualize colors in a intuitive, low-dependency and functional way.
Last updated
colorcolor-manipulationcolour
10.92 score 147 stars 47 dependents 450 scripts 36k downloads
themis - Extra Recipes Steps for Dealing with Unbalanced Data
A dataset with an uneven number of cases in each class is said to be unbalanced. Many models produce a subpar performance on unbalanced datasets. A dataset can be balanced by increasing the number of minority cases using SMOTE 2011 <doi:10.48550/arXiv.1106.1813>, BorderlineSMOTE 2005 <doi:10.1007/11538059_91> and ADASYN 2008 <https://ieeexplore.ieee.org/document/4633969>. Or by decreasing the number of majority cases using NearMiss 2003 <https://www.site.uottawa.ca/~nat/Workshop2003/jzhang.pdf> or Tomek link removal 1976 <https://ieeexplore.ieee.org/document/4309452>.
Last updated
10.60 score 142 stars 2 dependents 1.7k scripts 18k downloads
textdata - Download and Load Various Text Datasets
Provides a framework to download, parse, and store text datasets on the disk and load them when needed. Includes various sentiment lexicons and labeled text data sets for classification and analysis.
Last updated
text-datasets
10.39 score 78 stars 3 dependents 1.5k scripts 9.1k downloads
textrecipes - Extra 'Recipes' for Text Processing
Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.
Last updated
10.06 score 164 stars 1 dependents 1.0k scripts 1.4k downloads
rules - Model Wrappers for Rule-Based Models
Bindings for additional models for use with the 'parsnip' package. Models include prediction rule ensembles (Friedman and Popescu, 2008) <doi:10.1214/07-AOAS148>, C5.0 rules (Quinlan, 1992 ISBN: 1558602380), and Cubist (Kuhn and Johnson, 2013) <doi:10.1007/978-1-4614-6849-3>.
Last updated
10.05 score 42 stars 1 dependents 17k scripts 3.8k downloads
tidyclust - A Common API to Clustering
A common interface to specifying clustering models, in the same style as 'parsnip'. Creates unified interface across different functions and computational engines.
Last updated
9.63 score 113 stars 276 scripts 4.3k downloads
embed - Extra Recipes for Encoding Predictors
Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.
Last updated
9.26 score 144 stars 1.2k scripts 1.6k downloads
discrim - Model Wrappers for Discriminant Analysis
Bindings for additional classification models for use with the 'parsnip' package. Models include flavors of discriminant analysis, such as linear (Fisher (1936) <doi:10.1111/j.1469-1809.1936.tb02137.x>), regularized (Friedman (1989) <doi:10.1080/01621459.1989.10478752>), and flexible (Hastie, Tibshirani, and Buja (1994) <doi:10.1080/01621459.1994.10476866>), as well as naive Bayes classifiers (Hand and Yu (2007) <doi:10.1111/j.1751-5823.2001.tb00465.x>).
Last updated
8.55 score 31 stars 1.3k scripts 5.0k downloads
emoji - Data and Function to Work with Emojis
Contains data about emojis with relevant metadata, and functions to work with emojis when they are in strings.
Last updated
7.74 score 28 stars 3 dependents 402 scripts 1.1k downloads
ggpage - Creates Page Layout Visualizations
Facilitates the creation of page layout visualizations in which words are represented as rectangles with sizes relating to the length of the words. Which then is divided in lines and pages for easy overview of up to quite large texts.
Last updated
data-visualizationdatavisualizationdatavizggplot2
7.57 score 342 stars 72 scripts 230 downloadsmodelenv - Provide Tools to Register Models for Use in 'tidymodels'
An developer focused, low dependency package in 'tidymodels' that provides functions to register how models are to be used. Functions to register models are complimented with accessor functions to retrieve registered model information to aid in model fitting and error handling.
Last updated
7.34 score 4 stars 56 dependents 1 scripts 32k downloads
orbital - Predict with 'tidymodels' Workflows in Databases
Turn 'tidymodels' workflows into objects containing the sufficient sequential equations to perform predictions. These smaller objects allow for low dependency prediction locally or directly in databases.
Last updated
7.30 score 48 stars 38 scripts 593 downloads
debrief - Text-Based Summaries for 'profvis' Profiling Data
Provides text-based summaries and analysis tools for 'profvis' profiling output. Designed for terminal workflows and artificial intelligence (AI) agent consumption, offering views including hotspot analysis, call trees, source context, caller/callee relationships, and memory allocation breakdowns.
Last updated
5.54 score 14 stars 4 scripts 559 downloadsfastTextR - An Interface to the 'fastText' Library
An interface to the 'fastText' library <https://github.com/facebookresearch/fastText>. The package can be used for text classification and to learn word vectors. An example how to use 'fastTextR' can be found in the 'README' file.
Last updated
cpp
5.35 score 5 stars 1 dependents 50 scripts 308 downloads
friends - The Entire Transcript from Friends in Tidy Format
The complete scripts from the American sitcom Friends in tibble format. Use this package to practice data wrangling, text analysis and network analysis.
Last updated
5.10 score 66 stars 38 scripts 266 downloads
hcandersenr - H.C. Andersens Fairy Tales
Texts for H.C. Andersens fairy tales, ready for text analysis. Fairy tales in German, Danish, English, Spanish and French.
Last updated
andersens-fairy-talestext-mining
4.57 score 10 stars 74 scripts 208 downloadswalmartAPI - Walmart Open API Wrapper
Provides API access to the Walmart Open API <https://developer.walmartlabs.com/>, that contains data about stores, Value of the day and products which includes names, sale prices, shipping rates and taxonomies.
Last updated
walmart-api
4.39 score 19 stars 13 scripts 180 downloadsmethcon5 - Identify and Rank CpG DNA Methylation Conservation Along the Human Genome
Identify and rank CpG DNA methylation conservation along the human genome. Specifically it includes bootstrapping methods to provide ranking which should adjust for the differences in length as without it short regions tend to get higher conservation scores.
Last updated
2.70 score 6 scripts 179 downloads