Scheduling R Scripts

Comic from XKCD

Inspiration for this post

Since starting my job as a data scientist, I found myself setting dozens of reminders on my calendar to run certain scripts every day / week / month. I figured that there had to be a better way to deal with this, and after some research, I started off with creating cron jobs. After a few months of these running (with varying success), I learned about Launchd as well. This post will explain my process for using both of these scheduling tools to run R scripts at scheduled times.

Cron

Cron is a utility in Unix-like operating systems that can schedule a command to run automatically at a scheduled time. Cron is driven by a crontab file, so in order to schedule an R script, I needed to create a crontab file. Each line of a crontab file has the following syntax:

# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12)
# │ │ │ │ ┌───────────── day of the week (0 - 6) (Sunday to Saturday;
# │ │ │ │ │                                   7 is also Sunday on some systems)
# │ │ │ │ │
# │ │ │ │ │
# * * * * * command to execute

Any of the time arguments that are left as * are treated as a wildcard, so if you schedule 20 10 3 * * command to execute, that command will run at 10:20 on the 3rd of every month.

Beyond figuring out the timing of your job, you also need to figure out what command to execute. Since my work is primarily in R, I wanted to figure out how to execute an R script via cron. If the script in question can be found at /Desktop/my_script.R, for example, the command will be:

/usr/local/bin/Rscript /Desktop/my_script.R

You can test this by running just this command in Terminal (without worrying about cron yet) and seeing if your script runs.

So now, you can assemble the syntax for the cron job. If you want myscript.R to run at 10:20 on the 3rd of every month, the cron syntax will be:

20 10 3 * * /usr/local/bin/Rscript /Desktop/my_script.R

You can check your syntax using this handy website, which helps you generate the proper cron syntax given a desired schedule and script.

Once you have your script running and the syntax ready for the schedule you want, you are ready to edit the crontab on your computer. To do so, open up Terminal and run:

crontab -e

When you type i, you will enter insert mode, and you can paste your cron sytax into the file. When you are done editing, hit ESC and then type :wq to save and quit.

Emailing

When I created my first cron job, I was pretty skeptical about the likelihood of it actually running when scheduled, so I wanted to add the ability to get an email about the script’s progress. To do that, you can add to the MAILTO option, so your crontab would look like this:

MAILTO="myemail@email.com;mybackupemail@gmail.com"
20 10 3 * * /usr/local/bin/Rscript /Desktop/my_script.R

This will email you on successful runs of the script as well as errors, so if your script is set up to run hourly, you might not want all those emails. This blog post includes lots of additional helpful information about automating R scripts with cron, including how to receive emails for errors only.

As far as my experiences with cron go, I did not always receive the emails, even if the script did in fact run properly. I seemed to only receive the emails while I was at work, not at home (no idea why that happened or if there was any causal reason for it). Beyond this weirdness, the main issue with cron is that your jobs will not run if your computer is asleep at the scheduled time.

Launchd

After some of these various frustrations with cron, I learned that launchd is the preferred tool in macOS to execute scripts in a scheduled way. If your computer is asleep when the scheduled job is supposed to occur, the job will still run when your computer wakes up. (If your machine is off at the scheduled time, however, the job will not run until the next scheduled time.) There’s a lot I don’t understand about launchd (for a great overview, check this Medium article), but I broke the process down into 3 steps:

  1. Create a shell script to execute the R script
  2. Create a .plist file to schedule the running of the shell script
  3. Load the job

Create the shell script

The shell script is basically going to look exactly the same as the command you would have used for the cron job. I saved these scripts in the ~/ directory, although you can probably save them anywhere as long as you keep track of the paths. The shell script should look something like this:

#!/bin/sh
/usr/local/bin/Rscript /Desktop/my_script.R

The first line #!/bin/sh tells Unix how the file is supposed to be executed.

Then run the following in terminal to make the shell file runnable:

chmod +x launchd_shell_script.sh

Create a .plist file

The .plist files that launchd uses are special XML files that allow you to specify the script to run and the schedule to run it on. I followed the following syntax for my .plist files:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <!-- Should be the same as the filename without the extension -->
    <string>com.mylaunchjob</string>
    <key>ProgramArguments</key>
    <array>
        <string>~/launchd_shell_script.sh</string>
    </array>
    <!-- Schedule regular runs -->
    <key>StartCalendarInterval</key>
    <dict>
        <key>Hour</key>
        <integer>10</integer>
        <key>Minute</key>
        <integer>20</integer>
        <key>Day</key>
        <integer>23</integer>
    </dict>
</dict>
</plist>

Some components to note:

  • The first string contains the name of the launch job in question (so this job would be saved as com.mylaunchjob.plist)
  • Within the array, you need to put the name of the shell script you just created
  • You can put the arguments to schedule the job under StartCalendarInterval

Save your .plist file as ~/Library/LaunchAgents/com.mylaunchjob.plist.

Load the job

The final step is to run the following command in Terminal:

launchctl load ~/Library/LaunchAgents/com.mylaunchjob.plist

Your job is now ready to go at the scheduled times.

Avatar
Cecina Babich Morrow
Compass PhD Program

My research interests range from harnessing data to improve mental healthcare to understanding global patterns of macroecology.

comments powered by Disqus

Related