A Appendix A: Using Git

Git is a version control system that can be used to deliver software, program code, data files, and other useful files. It is also facilitates updating code. For example, if I make a change to a file, I push it to the repository, and then users of the repository can download the changes to their working copy. The system will keep track of versions and manage mergers without conflicts. If it can’t resolve conflicts across versions, then the program let’s you know about the conflicts and you can choose to keep the conflicts as is or update to the newer version.

A.1 Installing git

Here I separate installing on PC and Mac. There are some issues that occur on each platform. For more information on RStudio and git see the webpage Using git from RStudio,

A.1.1 Git on a PC

Git doesn’t usually come installed on a PC. You can download it at http://git-scm.com/downloads.

There may be an issue of RStudio finding the path to the git executable, so within RStudio you may need to go to Tools/Global Options/Git and select the path to the git.exe file (likely c:\Program Files\Git\bin).

A.1.2 Git on Mac

If you already have xcode installed, then you are ok because git comes with xcode. If you don’t have xcode installed, then you can either download and install xcode from the apple website or go to http://git-scm.com/downloads and download/install the git package directly. One way to check if git is already installed, is to open a terminal window (Applications/Utilities/Terminal). At the prompt type

which git

If git is installed it will report the directory path. All you have to do is enter that directory path in RStudio under Tools/Global Options/Git so that RStudio can find the git program. You open up a terminal window and click to the path. The path may look something line /usr/bin/git. That is not a mispelling there is a folder called /usr (not /users). On more recent macs, many utility folders like /usr are hidden. To see them from the dialogue box (like when you open a window to select a folder and you see a list of possible folders), enter command-shift-. (the last symbol is period). Now you may see a few more folders pop up, just navigate to the path you found from which. Usually to get to /usr you need to get the highest path possible, not your home directory. Your home directory is in /users/YOURNAME but the /usr folder is one level up, on par with /users.

Some mac users reported that even after installing git RStudio reported not finding git. The kludgy solution appears is to point the Rstudio git to the default path. In RStudio from Tools/Global Options/Git click on Browse to change the path, go to /Applications/Xcode and select Xcode. The path will change to Xcode.app. Then click Browse again (yes, again as second time is a charm) and click on contents/developer/usr/bin and click on git. The path will change, save, exit RStudio and restart. Git should now work.

Some students in previous workshops reported that if they already have xcode installed, then they have problems running git if they separately downloaded git from the website. I guess the mac gets confused about which git it should run. Not sure how wide spread this issue can be but two students in the past reported having this issue.

Another student reported success using SSH instead of https. This involved setting up an ssh key, cutting and pasting to the bitbucket site (go to your primary settings page and click on ssh keys) and switching from https to ssh (which is in the upper right of the bitbucket repo page). One additional student had success with this but also needed to execute ssh -T at the Mac terminal window to enable “handshaking” between their computer and the ssh key on bitbucket.

I wish I had a simpler set of instructions for Mac users. I’m a Mac user too and I’ve never had these problems so I’m going mostly with what I’m hearing from students.

A.2 Git Commands (both PC and Mac)

Some things to check include running git config -l to see your local configuration. These commands are typed in the “terminal” section of RStudio (not the R console where you type R commands). You could also type them into the terminal window on your Mac or PC.

You may need to set username and email by typing the following commands in the terminal section of RSstudio:

git config –global user.name “John Doe”

git config –global user.email

If you want to pull my covid-19 repository code that includes the files to produce this book, you can issue the following two commands in the terminal section of RStudio:

git remote add origin https://github.com/gonzoum/covid19-analyses

git pull origin master

You would need to be in the folder where you want to save this repository because the ``git remote add’’ command downloads the repository to the current folder.

To download the Johns Hopkins data repository described in Chapter 3, change to a different folder and execute these two commands:

git remote add origin https://github.com/CSSEGISandData/COVID-19

git pull origin master

The reason you need to switch to a different folder is that you can’t have two git repositories in the same folder because they would conflict.

A.3 Brief Git Explanation

The idea is simple. I push (copy) my changes to the repository and you pull (download) the repository to your computer. The software keeps track of all changes that are pushed to the git server, who made those changes, and any notes that the person wants to attach to those changes. The software keeps track of what version you have installed locally on your machine and lets you know if there are more recent versions on the server. If there are conflicts, such as you may have edited a line of a file that another person working on the same file may have also edited (assuming they commited their edits and pushed them to the server), then the software points to the two versions and asks you want you want to do such as accept their edits, accept your edits, etc.

A.4 Using Git beyond this book

I strongly recommend that you use Git as part of your code writing, especially if you collaborate with others including your advisors. You can use git with any text files not just with R. For example, if you write SAS syntax, SPSS syntax, Stata code, whatever, you can use git to keep track of all the files.

Git has the ability to “pull” and “push” changes to the server. This is great for keeping track of who made what changes to a document. But it does not work well with MS Word documents.

Git keeps a complete record of all the changes done on all the files relevant to the project. No more emailing each other about who has the most recent copy. Dropbox and related platforms may work well for some things but they do not perform well for collaboration over writing code as one does in data analysis.

There are several interfaces available to work with Git with many more features than are currently implemented in Rstudio. I like SourceTree (http://www.sourcetreeapp.com/) but there are many other utilities.

You can create your own repositories using the popular GitHub or BitBucket websites. BitBucket has more options for free private repositories, and academics with an edu in their email get special free academic upgrades (see the BitBucket website for details).