R packages and Git/GitHub
Git is a `versioning’ or `revision control’ system. These integrate into software projects to, among other things, record incremental changes at developer-chosen milestones. This enables the developer to easily revert back to a previous version of their code, should something go wrong. Github is web-hosted platform for sharing projects developed with the git versioning system (in particular this allows multiple developers to contribute to the same project, though that is generally beyond the scope of my work). It is an appealing place to host an `in-development’ R package since:
- It serves as an off site backup of your precious code
- With the excellent `devtools’ package, it is very easy to install R packages that are hosted on guthub from within R. Therefore github provides an excellent way to share your package with it’s users (e.g. your collaborators) – you don’t need to email them the package every time you update it, just let them know to re-install from github.
Rstudio does an excellent job of enabling development of R packages with git, and syncing changes to github. However, largely due to my inexpertise with git, I always find the initial set up a bit fiddly, and sensitive to the order in which things are done. Hence why I have written this post – to serve as notes to self for future R packages. The rest of this post assumes that
- You have a GitHub account
- Git is already setup on your system. See this excellent LifeHacker tutorial for how to set up Git (and GitHub), and an expanded `beginner’s’ explanation of what they are.
Now, here is how to set up an R package for development in R studio, and sharing via GitHub.
1) Initiate an R package, with git
Click File -> New project… -> Create project from: `New Directory’ -> Project Type: `R package’. Now enter a name and add any R functions you want to kick the package off. Be sure to tick the checkbox for `Create a git hub repository for this project’:
Now we have an R package, with git set up locally. Whenever this `project’ is loaded in RStudio, you will see that the top right panel has a `git’ tab – click on it and you will see something like:
Before going further, it’s worth briefly explaining the concept of your `local git repository’. This consists of a collection of invisible by default (they start with a `.’) files in the same directory as the R package we just created. They contain a record of the files and folders you want associated with the git project. This does not have to be all the files and folders we keep in our R package directory; we must decide what we would like to include in (or `commit’ to) the git repository, and ultimately share with the world. Click on the `commit’ button (circled) and in the box that opens, on the left you will see all the files/folders present in the local R package directory. Yellow indicates that, so far, none of these have been added to git. Now check the files/folders you want to include in the git repository – obviously we need everything necessary for the package, so I checked the `DESCRIPTION’ and `NAMESPACE’ files, and the `R’ and `man’ folders. Before we actually add/commit them to the local git repository we need to enter a comment. This seems a bit pointless now, but in the future this will be a message to your users to describe the additions – perhaps a new function, or a bug fix.
Finally click `commit’ – we will be notified that our chosen files and folders have been added to the local git repository. Next, we want to get this stuff onto github…
2) Create a github repository to sync the R package to
Now, log into your github account. From the main page, click the “+New Repository” button:
I’ve chosen the same name as the R package. Enter a brief description for the benefit of others. If, like me, you have a free github account you will have to make your code public. I left the Readme box, and licenses, UNCHECKED – for the (simple) steps described below we need the online repository to be completely empty (these files can be added later).
Another nice thing about not adding the readme at this stage, is github reminds us of the git commands we will need next. I recommend copying the commands contained under ‘Push an existing repository from the command line’ to a text file for use in a minute.
3) Link the local git repository, with the repository on github
So, we now have a git repository locally, to which we have added the core files and folders of our project. We have also created an empty git repository online, to sync our R package to. But, neither git repository is aware the other exists. So, now we link them. Back in RStudio click Tools->Shell to open a terminal/command prompt in the directory of your new R package. You can view the files that make up the local git repository by typing
ls -a. Now enter the following command (or paste from what you copied earlier), to link the local repository with the github one:
git remote add origin https://github.com/yourusername/PackageName.git
Replace `yourusername’ with your own github username, and `PackageName’ with whatever you called your github repository. In my case this command looked like:
git remote add origin https://github.com/pjnewcombe/GithubTest.git
Now the local repository knows where to send files/edits. Enter the following to push the initial files and folders to github, and set the link once and for all:
git push -u origin master
This will send all the stuff we added to the local repository over to that in github. In my case this led to the following terminal output:
Counting objects: 9, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (9/9), 1.91 KiB | 0 bytes/s, done.
Total 9 (delta 0), reused 0 (delta 0)
* [new branch] master -> master
Branch master set up to track remote branch master from origin.
The local and GitHub repositories are now in sync!
4) Pushing future additions/edits to github, from within RStudio
Suppose while working on the R package in RStudio we add a file to the director (perhaps a README.md which a github project requires), and we edit one of the packages functions. In RStudio’s `git tab’ you can see “README.md” has appeared with a yellow box (git has detected the new file in the project folder, and is indicating that is not included in the repository). You will also see there is a blue box next my function file – RStudio/git are reminding me it has been modified relative to the last repository.
Similar to before we click the `commit’ button with the tick next to it, then check the boxes next to these files. We add a `commit’ message. See how RStudio highlights the modifications to the function file to us (when it is selected). Very cool.
We click `commit’ again to add them to the local repository. Now, due to our work above, the local repository knows about the github repository. In the topleft of the screen shot there is a warning, informing us that the local repository is ahead of the version on github – it knows we’ve not synced the changes yet:
Also notice, due to our work above, the `Pull’/`Push’ arrows in the top right – these were greyed out before. Since we have linked the repositories we can now use the `Push’ arrow to push the changes to GitHub: