Version Control using Git and Github (Part 1)
Git is an open source distributed version control system. It was developed by Linus Torvalds, the creator of Linux kernels, in 2005. They are used to handle small and very large project with speed and efficiency. In this, we would be looking at what do we mean by versioning, version control and what a verion control system is. Then we will go on to point out the differences between Git and Github. After that, we will touch upon the 3 stages in Git and the extra stage, 4 stage, in Github. Finally we would go through how to use Git. This part would only cover version control using Git whereby in the next part we would explain configuring and using Github.
With that let us start …..
What is versioning, version control and version control system?
Versioning is a process by which multiple releases of OSes, software etc are created and managed. Each subsequent releases have the same general function but are generally improved, features added or unnecessary features removed or customized.
Version control or source control, is the process of tracking and managing changes over time. Version control ensures collaborative data sharing and editing among users of systems.
Version control system (VCS), on the other hand, is a software that keeps track and manages changes to a file or a directory.
Why the fuss?
Without a VCS, it would be impossible to work simultaneously and collaborate. Minus VCS, there would be a single folder containing the project sitting in a central repository. When one of the team members works, the others would have to lay hands off the file he is working. This process can be error prone as it is easy for someone else to rewrite a change which has been made.
Saving versions when a change is made to the file is important. Essential due to the fact that incase the change is not acceptable we could always get back the file before the change was made. Unfortunately, without a VCS, this process becomes overly confusing and cumbersome. Different people would have different methods of versioning and with multiple people working on the same project it is quite easy to lose track as to which version does the file have to be reverted. VCS’s keeps track of the changes in a systematic manner containing a brief description of what the change as allowing us to easily revert to the version of the file required.
A notable side-effect of using VCS is that it serves as a backup. A VCS has a local repository residing in each of the team member’s machine and a central repository. In a scenario that the central repository goes down, there are multiple copies in each team member’s local from which the project can be restored provided all of them have regularly synchronized the changes.
There are many different VCS. Popular among them and widely used is the Git. In this article, I have tried to compile the basics required to use Git for version control. But before jumping into it further, we would have heard people use the terms Git and Github interchangeably. Are they one and the same? Let’s look into that in the next couple of paragraphs.
Git and Github — Are the the same?
Git is a version control system which lets you track changes you make to your files over time. With Git, we can revert to various states of your files (like a time traveling machine). We can also make a copy of your file, make changes to that copy, and then merge these changes to the original copy.
GitHub is an online hosting service for Git repositories. Imagine working on a project in your office. While at home assume that you remember something that requires some file to be modified. If the project is hosted on GitHub, these files can be accessed, changes made and files pushed back to GitHub. In summary, with GitHub the repository is stored on their platform. Another feature that comes with GitHub is the ability to collaborate with other developers from any location.
Thus Git allows us to track the changes made to a file in a local system whereby with Github files can be accessed from anywhere. Github requires Git to work whereas it is not true vice-versa.
With these let’s start with using Git starting from configuring, tracking and committing changes. Later we will see how to use Github to host our files remotely and push the changes to the files hosted in Github when the changes are made locally.
States and Stages in Git
Before going into using Git it is necessary to understand the different stages in Git. On a high level, we could tell that there are 2 states in Git. They are
- Untracked by Git: Untracked files are those that have been created within your repository’s working directory but have not yet been added to the repository’s tracking index using the git add command.
- Tracked by Git: Tracked files are that which exist in the Git’s tracking index.
When a project is in Git version control, there are 3 stages. They are
- Working Directory — This is the directory in which the project or file is residing. It may or may not be tracked by Git.
- Staging Area — When in working directory, the files to be tracked by Git is selected. Staging area is the playground where you group, add and organize the files to be committed to Git for tracking their versions.
- Git Directory — After the files to be committed are grouped and ready in the staging area, we can commit these files. Thus we commit this group of files along with a commit message explaining what is the commit about. When committed, a snapshot of the files in the commit is recorded by Git. The information related to this commit is stored in the Git directory. Thus, Git directory is the database where metadata about project files’ history will be tracked.
We know that Github is needed when it is required to collaborate and publish your code to a team or community. Thus, when using Github there is another step along with the three steps discussed above. The final step that is required when using Github is that once the changes are committed, it is required to push these changes into a remote repository. This is because the other team members can use, modify, commit and push their changes into the remote repository.
Now let’s see how this is done…
Using Git
The following paragraphs will provide an idea on using Git for version control
Installing Git: For using Git, it is required to install Git in the first place Assuming a Linux based system [I have Ubuntu installed and have tried it out in Ubuntu. This article is mostly on command line usage.], the following commands can be used to install Git in case it is not already installed
$ sudo apt-get install git (or)
$ sudo apt-get install git-all
Once Git is installed, we are ready to go. Type in
$ git --version
to get the version of Git installed.
Initializing Git: In this section we would look at how Git can be initialized to track a particular folder. Assume that we have a folder called Project [Working Directory] which needs to be tracked and where the files are residing. The location of this folder can be anywhere in the storage space of the local machine. Move into the Project folder using the terminal. To make the Project folder a Git repository we will have to type in the command
$ git init
Executing this command, will convert the folder into a Git repository. Looking into this folder, including the hidden files, we would see that a new sub-folder called .git would have been created. This sub-folder will hold all the metadata information that is necessary for the project in version control which includes information about commits, remote repository address, etc. It also contains a log that stores the commit history so that it can be rolled back in case required. Thus the git init command is used to convert an existing, unversioned folder to a Git repository or initialize a new, empty repository.
Moving files to Staging Area: Once the Git repository is initialized, we can add files into the Git repository. Here it is the ‘Project’ folder. Now these files which are added into the repository are untracked. In order to track these files, it has to be first moved into the staging area. This can be done by using the following command
$ git add <filename> — to add a particular file in the working
directory to the staging area
(or)
$ git add . - to add all the files in the working directory to the
staging area
Once the command is executed successfully, type in
$ git status
Executing the above command will give an output similar to the one shown in figure below
In the above figure, we see that two files have been added into the staging area. These files names are highlighted in a different colour. The message also tells us that there has been no commits yet which means that the files are only moved to the staging area and still not tracked.
Moving the files to Git Directory: In order for Git to start tracking, the files have to be moved to the Git Directory. This is done by executing the following command
$ git commit -m "Commit Message"
in the above -m is to indicate that we are typing in the commit message. On executing this command we get a message similar to the one in the following figure
We see that the files which was in the staging area has been committed now and is being tracked by Git. The insertions and deletions are when changes are made to the already committed file. Let’s first type in Git Status now and see what happens
We see that Git returns a message that there is nothing to commit. Let’s make some change to the file called ‘Invitation_host.docx’. I would be adding random text into it. Also I will add another file called ‘Example.pdf’ into the working directory. I would not be adding ‘Example.pdf’ into the staging area i.e. I would make changes in the ‘Invitation_host.docx’ and then type in Git status command
The above figure shows that the file ‘Invitation_host.docx’ has been modified and there is an untracked file ‘Example.pdf’ in the working folder. Now we will have to move these files into the staging area using the Git add command and then commit. Let’s do that and see the result
Initially the command Git commit was executed for which Git returned a message that the changes are not staged. So using Git add, the files were staged and then git commit was executed when both the files were committed into the Git Directory. Typing in Git status onto the command prompt will yield the output as in the following figure
The message in the figure above shows that there is nothing to commit and all the files are in the Git directory and that they are being tracked.
Viewing Commit History:In order to see the history of commits, we can use the following command
$ git log
Executing this on the command prompt shows all the commits that has been made along with a 40-digit hexadecimal number [SHA1-hash] which is used to identify the revisions made. Let’s try and execute Git log on the terminal
Executing the command will provide the output as in the figure above. The figure above shows that there are 3 commits that has been made. Each commit is identified with a 40-digit unique hexadecimal number. The commits are displayed in the reverse chronological order with the most recent commit first.
Reset and Revert: If we want to get back the previous commits there are two way to do it. It is possible to reset or revert a Git commit.
So what is the difference? Each time a commit is performed, there is a pointer which points to the most recent commit. In case of reset, the pointer moves back to the commit which was performed previously. In case of reset, a new commit is done. Let’s see how reset and revert are done
$ git reset <hash>
Executing the above we get the following.
Now Git has reset its pointer to point to the very first file. This can be verified using Git log as seen below
It can be seen that the pointer points to the first commit. Now the question arises, as how to go back to the most latest commit which was the third. For that, we can use
$ git reflog
This will give us the hashes of all the previous commits as shown in figure below. The last 7 digits of the hexadecimal number is only displayed and that would suffice to reset back to the version required.
Let’s try to reset the pointer to second commit using Git reset. Resetting and displaying the log using Git log, we will have the outputs as shown in the figure below
Now the pointer is reset to point to the second commit. Similarly we can bring back the pointer to the third commit.
Now let’s see how to work with revert. In order to do it, we will have to type in the following command
$ git revert HEAD
HEAD is just a special pointer that points to the local branch. Typing in this command will open a screen as shown below
We can close this file (the commands to close it would be listed when the file is open) and type in Git log. Typing it in will list give an output as below
The above figure shows that there is a fourth commit and checking that file we would see that the file is reverted back to the one in the second commit. Typing in Git reset command would reset it back to the first commit. If we want to go to a specific commit then:
$ git revert <HASH>
Thus the net effect of the Git revert command is similar to reset, but its approach is different. Where the reset command moves the branch pointer back in the chain to “undo” changes, the revert command adds a new commit at the end of the chain to “cancel” changes.
Which should we use — Revert or Reset?
If a commit has been made somewhere in the project’s history, and you later decide that the commit is wrong and should not have been done, then Git revert should be used. It will undo the changes introduced by the bad commit, recording the "undo" in the history.
If a commit has been made, but this has not been shared with anyone else and we decide that the change is not required, then Git reset should be used to rewrite the history so that it looks as though you never made that commit.
Finally, in order to remove a file use
$ git rm <Filename>
This will totally remove the file from the Git directory.
With this let me conclude this part. So we have seen how a Git repository is intitialized, files added to the staging area, files commited, reset or reverted and deleted. In the next part, we will take you through configuring Github and pushing Git commited files to the remote repository hosted by Github.