Lifecycle: development License: MIT


What is recodeflow?

recodeflow recodes variables from multiple data sets into harmonized variables.

recodeflow has basic functions and templates required to define, recode, and harmonize variables for any dataset.

Why should I use recodeflow?

Recoding and cleaning your data is typically the most time consuming step of your project. Existing functions such as sjmisc::rec() and dplyr:recode() work well but they are limited to recoding one variable at a time.

The recodeflow package takes data cleaning and recoding one step further. recodeflow allows you to recode multiple variables at the same time, and harmonize variables across similar databases even when the variables and variables’ categories change.

recodeflow also helps to reduce errors, document the recode process, and ensures your new variables have labels and other metadata.

Even if your project has few variables,recodeflow can save you time.

How does recodeflow work?

Use the worksheets variables and variable_details to list your variables and state how to recode the each variable.

Once your variables are defined, use recodeflow functions to clean and recode your data. The main recodeflow function is rec_with_table which recodes variables within you dataset(s) based on how you’ve defined the variable in the worksheets variables and variable_details.

What’s included in recodeflow?

The recodeflow package includes:

We’ve also created the following documentation to help you understand recodeflow:

Where is recodeflow used?

Currently recodeflow is used in packages that harmonize health surveys and health administrative databases.


Projects on the roadmap are at the Github repository recodeflow under the projects tab.


Please follow the recodeflow contribution guide if you would like to contribute to the recodeflow package.