Three tidy approaches to manage parameter values with example codes to predict penguin body mass

Photo by Ian Parker on Unsplash

Finding optimal values can make significant improvements in machine learning models. I enjoy experimenting with various values in the tuning processes and seeing my models getting better at prediction. However, it can be troublesome having to figure out the names, ranges, values, and other aspects for the parameters. Gladly, the dials library, which is part of the tidymodels, is created to make the parameter tuning a lot easier.

In this post, I present three ways to tune parameters with tidymodels and provide example codes. Most examples of tidymodels I have seen so far have not included dials in modelling. …

Hands-on Tutorials

An example of building models with two popular packages together in R to predict bike sharing demand

Photo by Chris Barbalis on Unsplash

Max Kuhn builds both packages (with contributions from many other talented people). The caret package (short for Classification And REgression Training) streamlines the process for creating predictive models and has been the top choice among R users. It’s been around for a long time, and there are numerous resources, answers, and solutions to all the possible questions. On the other hand, tidymodels is newer and is built on the tidyverse principles. RStudio hired Max intending to design a tidy version of the caret.

I have been using caret for predictive modelling. While I am aware of tidymodels, I only began…

A tutorial on visualising job search result with Sankey diagram

Photo by Cathryn Lavery on Unsplash

In the post, I first explained the function input for creating a Sankey graphic with Plotly library in Python. I also provided the full script used to create my job search chart. If you are looking to create a similar graph, I hope this post can help you!

If you are a Reddit who subscribed to r/dataisbeautiful, you probably have seen these types of graphs:

An example project of using Selenium Webdriver to automate data scraping with Python and Google Chrome

Photo by Vernon Raineil Cenzon on Unsplash

Why this feature image? Because this post is about collecting data on scooter subsidy in Taiwan, and I miss the food in 饒河夜市 (Raohe night market).

Background

In Spring 2020, I conducted research on Taiwan’s subsidy on electric scooters for my Environmental Economics course. Since electric scooters are more expensive than gasoline ones, I wanted to know whether the policy actually created incentives for people to purchase electric scooters, and I needed data.

I immediately ran into the first challenge that I could not get the data. More specifically, the type of data, which is monthly subsidised counts for each city…

A short guide to working with XML files in Python and creating visualisation in Tableau Public

Photo by David Grandmougin on Unsplash

While I love working with online and public datasets, it’s always fun to work with my own data. My iPhone is always with me and has been recording walking distance and steps. I rarely left my apartment since the quarantine began in March, so I thought it would be interesting to see the change my walking distance from 2016 to 2020.

Export Data

Project report of building classification models with MICS microdata using caret in R

Photo by Dan Freeman on Unsplash

This post focuses on the overall predictive modelling workflow, including analysis strategy, model comparison, and limitations. If you are more interested in the technical aspect or viewing the code, please see:

The repository is available on GitHub.

1. Introduction

Contraception is designed to prevent pregnancy. By preventing unintended pregnancy, contraceptive methods help women to avoid pregnancy- and birth-related morbidity and mortality. Ensuring access to contraception advances human rights, including the right to life and liberty, freedom of opinion and expression, and the right to work and education.¹ In 2019, among the 1.9 …

An example project of training and testing machine learning (ML) models with caret in R to predict contraceptive use

Photo by Christopher Gower on Unsplash

This post focuses on the technical aspect of this project. If you’re interested in viewing the study paper to learn about the overall workflow, including analysis strategy, model comparison, and limitations, please view:

1. Overview

An Introduction to Statistical Learning: With Applications in R” or ISLR was my first book on predictive analytics, and I strongly recommend everyone interesting in machine learning to read the book. I learnt how to programme in R and use various statistical packages, such as glm and randomForest, but it felt inefficient, having so many different packages. Gladly, there are several libraries available that attempt to streamline…

Yu En Hsu

I am passionate about using data to make the world a better place, and I write about data science, visualisation, and machine learning.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store