Finding optimal values can make significant improvements in machine learning models. I enjoy experimenting with various values in the tuning processes and seeing my models getting better at prediction. However, it can be troublesome having to figure out the names, ranges, values, and other aspects for the parameters. Gladly, the dials
library, which is part of the tidymodels
, is created to make the parameter tuning a lot easier.
In this post, I present three ways to tune parameters with tidymodels
and provide example codes. Most examples of tidymodels
I have seen so far have not included dials
in modelling. …
Max Kuhn builds both packages (with contributions from many other talented people). The caret
package (short for Classification And REgression Training) streamlines the process for creating predictive models and has been the top choice among R users. It’s been around for a long time, and there are numerous resources, answers, and solutions to all the possible questions. On the other hand, tidymodels
is newer and is built on the tidyverse
principles. RStudio hired Max intending to design a tidy version of the caret.
I have been using caret
for predictive modelling. While I am aware of tidymodels
, I only began…
In the post, I first explained the function input for creating a Sankey graphic with Plotly library in Python. I also provided the full script used to create my job search chart. If you are looking to create a similar graph, I hope this post can help you!
If you are a Reddit who subscribed to r/dataisbeautiful, you probably have seen these types of graphs:
Why this feature image? Because this post is about collecting data on scooter subsidy in Taiwan, and I miss the food in 饒河夜市 (Raohe night market).
In Spring 2020, I conducted research on Taiwan’s subsidy on electric scooters for my Environmental Economics course. Since electric scooters are more expensive than gasoline ones, I wanted to know whether the policy actually created incentives for people to purchase electric scooters, and I needed data.
I immediately ran into the first challenge that I could not get the data. More specifically, the type of data, which is monthly subsidised counts for each city…
While I love working with online and public datasets, it’s always fun to work with my own data. My iPhone is always with me and has been recording walking distance and steps. I rarely left my apartment since the quarantine began in March, so I thought it would be interesting to see the change my walking distance from 2016 to 2020.
This post focuses on the overall predictive modelling workflow, including analysis strategy, model comparison, and limitations. If you are more interested in the technical aspect or viewing the code, please see:
The repository is available on GitHub.
Contraception is designed to prevent pregnancy. By preventing unintended pregnancy, contraceptive methods help women to avoid pregnancy- and birth-related morbidity and mortality. Ensuring access to contraception advances human rights, including the right to life and liberty, freedom of opinion and expression, and the right to work and education.¹ In 2019, among the 1.9 …
caret
in R to predict contraceptive useThis post focuses on the technical aspect of this project. If you’re interested in viewing the study paper to learn about the overall workflow, including analysis strategy, model comparison, and limitations, please view:
“An Introduction to Statistical Learning: With Applications in R” or ISLR was my first book on predictive analytics, and I strongly recommend everyone interesting in machine learning to read the book. I learnt how to programme in R and use various statistical packages, such as glm
and randomForest
, but it felt inefficient, having so many different packages. Gladly, there are several libraries available that attempt to streamline…
I am passionate about using data to make the world a better place, and I write about data science, visualisation, and machine learning.