Learn the basic concepts, tools, and features that you will need to manage source code using the git version control system.
Develop your git skills with this holistic course for git beginners
Powerful Version Control at Your Fingertips
Learning the fundamentals of git puts a powerful and very useful tool at your fingertips. Git is free, powerful, has excellent documentation, and is quickly becoming the de-facto standard for distributed source code version control.
In 2014, Git surpassed SVN as the most popular version control system according to the 2014 Eclipse community survey. Git is used in open source and enterprise applications around the world, and being able to effectively manage source code using git will make you a valuable asset to any organization using it.
Content and Overview
Suitable for programmers of all skill levels, through this 30-minute crash course you’ll learn all of the git fundamentals and establish a strong understanding of the concept behind version control in general and git in particular. Each chapter contains screencasts that you can follow-along with, and contains extra resources to put your new learned skills into practical use immediately.
Starting with an overview of version control systems, this course will take you through the core concepts of git and the basic commands you'll need to interact with git. By manipulating real repositories, you'll gain an understanding of not only "how" to interact with git, but also "why" it works the way it does.
With these concepts mastered, the course will take you through standard git workflows, showing you the basic commands for creating new repositories, interacting with existing repositories, creating and managing branches, and resolving merge conflicts.
Students completing the course will have the knowledge to create and manage git repositories in any language and platform.
Hey there everyone. My name is John O'Connor and welcome CodeYourFaceOff! Today, we're going to talk about git.
Before we dive right in, I'd like to briefly cover what we'll be going over in this series and who these videos are for. This series is designed to be a holistic and approachable beginners guide to the git version control system. As an experienced Online mentor and educator I've worked a lot of developers of all skill levels. One common thread is that git is a difficult but worthwhile skill to learn, so I created this guide to help developers that are new to git.
This guide is suitable for beginner programmers - however, seasoned developers that are looking for a solid introduction to using git will also find this guide useful.
Some of the things we'll cover in this video are:
* Version control systems, and how git compares to other ones.
* How to install git and how git repositories work
* How to interact with git repositories using terminal commands
* How to work with other developers using git
* and a lot more.
Git can seem quirky at times and I've found that understanding some of what git is doing under the hood demystifies a lot of that quirkiness. So that's how this guide is set up.
I hope you enjoy and let's get started.
In order to gain a better understanding of git, it's helpful to first talk about the problem that git is trying to solve.
It all starts here with Alice.
Alice is a developer working on a team that's developing the "next biggest thing" mobile app. (frame 1, step 2-3)
This app is pretty big, so Alice finds that she often needs to collaborate with other developers on her team. To do this, they need to share their code.
In the beginning, her team would share code by putting the code on a shared file server. When Alice was done with code, she would upload it to the server for everyone on the team to see. This worked well. . . . for a while.
One day however, Bob and Alice were working on two separate tasks that had them both working in the same file. Bob made his changes and uploaded them to the shared server. However, Alice was already in the process of updating her copy of the file and she didn't realize that Bob had made an update.
When Alice uploaded her version of the file, it overwrote Bob's version. Since Alice didn't have any of Bob's changes, all of Bob's hard work was now lost forever.
The first few times this happened to the team, they were easily able to re-write the missing changes and fix the problem. But then one day a very large chunk of code was removed and it took a week to re-write it all. The team was losing productivity and something had to be done about it.
After some searching for a solution, Alice discovered something called a "version control system". A Version Control System is a piece of software that keeps track of a file as it's changed. Each time the file is changed, the software adds a new "version" to the Version Control System. Using a Version Control System a team can ensure that no changes are lost. Even if someone accidentally deletes a file completely, the team can simply grab a previous version of the file from the Version Control System and they're back in business.
Using their new Version Control System, Alice can now work on her code and save a version in the Version Control System. When Bob wants to work on that same file, he retrieves the latest version from the Version Control System. Doing this ensures that Bob and Alice are always working from the latest version of the code and no changes are ever lost.
There are many types of version control system that solve the problem of storing file versions in many different ways. In this section we'll discuss how version control systems work at a high level and create some classifications for the types of version control systems out there so we can better understand how "git" fits into the version control landscape.
The first type of version control system is an "ad-hoc" version control system. In this version control system, developers save different versions of their files with different file names (usually adding the date or a version number into the name of the file). For example, "MyCoolFile.txt.v1 and MyCoolFile.txt.v2" would be stored in the same folder and would represent two different versions of the same file.
If Bob and Alice wanted to work together on the same files, they'd have to remember to save each version of their files with a new version number.
This type of version control system is, for obvious reasons, very inefficient. Firstly, developers have to keep track of the file versions themselves. Any mistake in naming the file can result in the loss of a version (for example, if a developer accidentally names his file "v2" when there is already a "v2" of a file).
Also, saving the same files multiple times requires a great deal of storage space. Imagine if each file was 100MB, and you had 100 versions of the file. You now have 10GB of storage needed to store just 1 file. Some repositories (that is, version control systems) store thousands of versions of hundreds of files.
This method is the easiest to implement, but is far too cumbersome to be effective.
Most modern Version Control Systems don't store multiple versions of a file, but rather store a single copy of a file and then keep track of the differences between each version. They then layer these differences on top of each other until you get to the current version of the code.
This is a much more efficient model since most of the time the differences between two versions of a file are very small. Rather than having to re-save the entire file again, we're only storing the changes from one file to another.
Within the modern version control systems there are 3 major categories of version control that you will encounter:
Let's explore each of these to get an idea of where git fits into the mix.
The first type of version control system we'll talk about is "centralized" version control. This simply means that the main list of changes, called a "repository", is stored on a central server that all of the developers can access. Each developer must be able to connect to the central server in order to get, or "check out" any version of a file.
If Bob and Alice are working together using a centralized version control system, then when Alice wants to make a change to a file she has to "check out" the file. By default she will automatically get the current, most recently updated version of the file (although all version control systems support checking out previous versions of a file). She then works on the file and, when she's done, does a "check in" to add her changes and create a new version in the VCS.
One of the main issues that comes up with a centralized version control system is: "What if two people need to make changes to the same file at the same time?" The way that this question is answered is how we separate a "locking" centralized VCS from a "Merging" VCS.
In a "locking" Version Control System (VCS), when a developer wants to work on a file he or she "checks out" the file from the Version Control System. During the time that the file is checked out by a developer, no other developers are allowed to make changes to that file.
While this solves the problem of preventing developers from making changes to the same file at the same time, locking a file has it's drawbacks. If developers can't work on the same files together they might be blocked from continuing work while waiting for their turn. If a developer goes on vacation and forgets to unlock a file the entire team can be impacted.
Merging version control systems allow multiple developers to work on the same files at the same time. It does this by making a copy of a file for each developer and allowing them to work on those separate copies. Then, when the users are ready to share their changes, the VCS detects the difference between the files and merges them together into one file.
This works well as long as the merging system can figure out which changes should be merged from each copy. However, if developers make changes to the same files in the same places, it can be impossible for the system to figure out which changes are the most important. In this case, the merging system needs to get human input on which changes it should use.
This condition is known as a "merge conflict" and it means that the changes from each copy of the file need to be manually merged by a person.
One of the major advantages to a "centralized" version control system is that, in order to have access to the code, the developer must have access to the server.
This can be helpful if, say, a developer or contractor leaves your company. If the team is storing versions in files or (as we'll see later) in a distributed version control system, that developer gets to keep all of those files and the company has to trust that the developer is being honest when she says she deleted the files from her computer.
With a central version control system, once the developer loses access to the repository server, they can no longer get access to the code base.
Centralized version control systems also keep the code all in a single location on a remote server, so if the developers change computers or experience a hard drive failure, the code base is still stored safely on the central server.
Despite the advantages, there's a reason that not all version control systems are centralized.
Firstly, the central server is - itself - just a computer. If that central server goes down, the entire repository and all of the code stored in it goes away. Worse still, since the central repository contains all of the code and none of the developers can work without access to it, the entire team has to shut down until the server can come back Online. Hopefully it wasn't a catastrophic hard drive failure!
The next form of version control system we'll discuss is the "distributed version control system". Like Centralized Version Control Systems (and all modern version control systems), distributed version control systems still keep track of the different versions of code by tracking the difference between each version of a file rather than separate copies of the file for each version. However, distributed version control systems allow each developer to keep an entire copy of the whole history on their local computers.
The first major advantage to this is that if a developer experiences a fatal hard drive crash, he or she can simply visit a friend that also has a copy of the full repository and get back up-and-running quickly. There is no "main point of failure" in a distributed system since everyone has a full copy of the entire code history.
Another advantage is that it's easy for developers to compare multiple versions of a file without having to download those changes from the central server. He or she can simply open their local copy of the versions and they're all set.
Finally, distributed version control systems are a more democratic way of distributing code. Open Source Software (OSS) is software who's source code is developed by people all over the world, and often is meant to be owned by the community rather than by any individual person or corporation. By ensuring that the entire version history can be accessed by any person at any time rather than being guarded by a central authority, a distributed system ensures that code will stay the property of the community.
Of course, one of the major downsides to having an entire copy of the version history for every developer is that, once they have the code, it's impossible to be sure that a developer or contractor that is no longer on the team no longer has access to the code.
Another problem is that developers must actively seek out someone with the code that they want to work on, and they need to make sure they can connect to that developers' computer. That doesn't seem like a huge problem on a corporate network, but even then it can be a problem if a developer is hosting code on his laptop and takes it home for a week long vacation.
The final version control system we'll discuss, and the version control system that best describes the most common use of "git", is a hybrid version control system. Hybrid version control systems allow developers to keep a copy of the entire source code version history on their computer, but also allows a company to host a central "origin" server that contains the main copy that everyone syncs up to.
Hybrids are really just a combination of "centralized" and "distributed", and each one fits in a slightly different place on the spectrum. Git is decidedly a mostly distributed version control system, with a few small features that push it into "hybrid" territory.
When you're working with other developers, you might hear them refer to "github" as in "I'm pushing my code to github". It's a popular term that's often used incorrectly so I'd like to clear it up now while we have a chance.
Git is a very popular version control system, and you'll sometimes hear about other services related to git that use the "git" name.
One such popular service is "github".
Github is website that offers central storage for developers that want to share their code with each other using "git".
"git" is a version control system - basically just a program that you can use to save and share your code, while GitHub is a website that developers use to store their "git" code and share it with other developers.
In this exercise, you'll be creating a github account. This will give you the ability so share your code with the world and, while not a necessary step, will be helpful as we progress through this lecture.
1. Navigate to http://www.github.com/join/
2. Choose a username and password and enter them into the form.
3. Enter a valid email address in the form and click "Create an Account"
4. In the "select a plan" page, you can choose a either the free version of github or you can opt to pay for a paid plan. The free plan will allow you to use all of the features of github and will make all of the repositories you create public. The paid plans include private as well as public repositories.
5. Click on the "next" button to go to your new github dashboard.
Once you've completed these steps, move on to the next lesson.
When you break it down, git is really just a computer program. In this lesson we'll walk through how to install git and the git command shell - the program you'll most often use to run the git program and all of the git commands.
Install the github git tools for your platform by following the instructions in the video for this lesson.
When you are finished, move on to the next lesson.
Github is a website that allows developers to share their code with each other using a single git repository. The github git tools are a convenient set of well written tools that make interacting with git easier.
If you're using Windows, you can also download a tool set called "git extensions" that makes working with git in Windows even easier.
Check out the git extensions website below to learn more.
At it's core, git is simply a database. More specifically, it's something known as a "flat-file" database. This means that the database itself is just stored in files. You can find these files by looking in your project directory for a folder called
This folder contains all of the files that make up your git repository or "database".
You can create the git database by using the initialization command:
This creates the .git folder and the initial structure of the database. However, it doesn't add anything to git. Even if there are many files inside of your folder, they won't be known to git until you tell git to keep track of them. Git uses something called an "index" to determine which files to keep track of.
I'll show you how to add files to the index later in the tutorial, but for now keep in mind that a repository starts out as an empty database and needs to be told what to watch for.
When you make a change to your code, git saves those changes in something called a "commit". You can think of a "commit" as a snapshot of your entire repository and all of its files at a specific point in time. If you start working with some files and break things, you can go back to a previous commit.
Most of git's commands operate on commits rather than files, so it's important to commit your code early and often. While git is a robust system, commits are it's currency and in order for git's tools to work it needs to have a commit to refer to.
When in doubt, make a new commit. They're easy to do, cheap to create, and they can save you from a lot of trouble later on. Most of the problems I encounter and solve for clients are a result of them trying to do some git operation before they created a commit of their most recent changes.
When you work with git, you'll very quickly learn about tools that allow you to share code with other developers. Some popular ones include "github" and "bitbucket".
Central to the idea of git is that each person working with code has his or her own "repository". Any time you create a "commit", you're committing to your own version of the git database contained on your computer. Other people can have a version of the database on their own computers as well, and they can commit to that without ever affecting your version of the repository.
In order to share code together, you need to tell git to take the changes from your repository and share it with a common shared repository (one that you and everyone else can access). Git will then merge the commits together (ensuring to preserve the order) and will update the remote repository so that it now contains the same commits as yours.
Alternatively, if you know someone has updated the shared repository, you can get those new commits from the shared repository and merge them into your local repository. This will give you the latest version of the code from everyone else using the code.
When working with git, there are two major ways that git keeps track of changes. The first is in commits which contain the changes made to a repository across all files at a certain point in time. The second is an index, which git uses to keep track files git cares to look at.
In this section we'll explore the index and how you'll interact with it.
By default, git doesn't know about any of the files that you want to keep track of. You have to tell it to keep track of those files in order for it to know where to look for changes.
To keep track of these files, git stores them in an "index". The "index" is used to determine which file's changes should be in a commit, which files should be ignored, and which files aren't currently being tracked.
The following lessons will show you how to add or ignore files from being tracked in the index.
To tell git to add files to the index, you need to use the "git add" command followed by the name of the file you want to add. It looks something like this:
git add file_to_add.txt
Running this command at a terminal will tell git to start tracking the changes in that file whenever your perform a git operation. It also means that changes to this file will now be included when you make a commit.
You can tell git to track all new files by using the "git add" command and using a "dot" in place of a file name. Git will search through the current directory for any new files and add them to the tracking index.
Here I'm also using the "git status" command to show you which files have been added to the index and which ones are not yet being tracked.
Adding files is easy, but what if there are files you explicitly want git to never look at?
When performing add operations, git first looks for a text file with the name .gitignore. This file just contains a list of files for git to ignore, one on each line. An example would be the following:
# .gitignore ./c9 filetoignore.txt this/is/a/directory/to/ignore
git adds these to an 'ignore' list and will not add these files to it's index OR keep track of changes to these files. It can be extremely helpful to ignore certain files because some files will be constantly changing and constantly keeping track of those changes may not be important. A good example is a file that is compiled or generated from other sources. You can re-generate the file as long as you track the changes for the original sources, and most likely you'll want to do that anyway. So you can safely ignore this generated file.
A "remote" in git is a reference to the location of someone else's repository. If you're working on a team that's sharing a repository then you'll have a remote called "origin" which refers to the location of the shared repository.
A repository can have multiple remotes - which again are just references to someone else's repository. For example, you can have a reference to your the repository your team shares (called "origin") and another reference to a specific coworkers' repository (ie: "bob") where you can connect and get his changes early.
The git command for working with remotes is
The "git remote" command is used to manage and update remote repository links (which, again, are just links to someone else's code).
To see a list of all of the current remote repository links, type
git remote -v
You may have a default remote called "origin" if you started your local repository by cloning someone elses'. Typing "git remote -v" will show you the location of that remote repository.
Each of the remote links is named so can specify which link you'd like to use. For example, you can create a remote link called "upstream" if you're working on code that you'd like to periodically update from another source. You can pull updates from that upstream to keep your code up-to-date with someone else's. This happens all the time in the open source world, where one version (or distro) of the Linux code base might be derived from another.
As you can see here, there are two different references for each named link - fetch and push. Don't worry about the difference between "fetch" and "push" for now. The important part is the name of the remote ("origin") and the location ("https://github.io/sax1johno/repo_name").
When you want to add a new reference to a remote repository, you can use the "git remote add" command. You'll then give your remote a name and specify the location of that remote.
git remote add newName https://github.io/sax1johno/test_repo
git remote -v after
git remote add <name> <location> is a good way to ensure that adding the remote reference actually worked.
You can add as many remote references as you'd like - there's no practical limit.
What if you've added a remote reference only to discover that you misspelled the location URL, or you'd like to simply get rid of one of the remotes, you can do so with the
git remote rm command. The
git remote rm command takes the name of the remote you'd like to remove and removes the reference.
Each git repository can have multiple different remotes that it can be connected to. This is helpful when you're working with code that is based on another common code base (also sometimes referred to as an "upstream").
A good example is customizing content management system (such as wordpress or drupal) to meet the requirements of a specific project for different clients. The "upstream" remote would be the base project - for example, wordpress - while the origin remote would be your specific copy and modifications of that repository.
You can set up as many remotes as you'd like based on your project needs and requirements.
Cloning in git means making a local copy of some remote repository. The "clone" process gets the latest version of a remote repository and sets it up "origin" in the local copy.
When a git repository is cloned, the entire history of the whole repository from the moment it was created until the latest commit are also copied as well.
In a team setting, this means that there is a backup of the code for every developer on the team and, if the main repository that everyone shares goes down, the code and version history can be recovered from any other developers' local copy.
To create a new clone of another repository, use the "git clone" command followed by the URL of the repository you'd like to clone.
The "url" is just the remote location of another repository, and common ones will start with "http" (ie: https://github.com/sax1johno/acute.git") or to a special "git" name which contains the computer name and user without any prefix (ie: email@example.com:sax1johno/acute.git). This "git" location is also sometimes called the "ssh" location, as git uses the "ssh" protocol to actually copy the files from some other remote computer onto yours.
"git clone" also automatically sets up an initial remote repository called "origin" which references the url given to "git clone". To test this, run the "git remote -v" command after you've created a clone and you'll see the newly added origin.
A git repository is structured as collection of "branches". A "branch" is simply a separate copy of the code. You can create multiple branches in your repository, which allows you to work on multiple copies of the code base at the same time.
Branches are often used to separate experimental features from known working code. That way, if you end up creating code that causes issues or is more of a headache than it's worth, you can always go back to a known working version and start over on a new copy (or branch).
If you're working in an experimental branch and you end up with code changes that you'd like to keep, you can merge the commits that you made with those changes in one branch into any other branch.
Every repository starts out with a single main branch, called
master. This is also sometimes called the
trunk as it's the default branch and other branches are created off of it.
master branch is what you'll use most of the time during development.
git branch command is used to control and manage branching inside of git. You can create, destroy, and view the branches in your repository using git branch. Switching between them, however, does NOT use git branch (we'll discuss that one later).
To create a branch from your current branch, use the following command:
git branch <branch_name>
You can view all of the branches you have by using
git branch with no extra parameters.
When you want to switch from one branch to another you need to "check out" the other branch. You can check out a branch by using the
git checkout command followed by the name of the branch that you'd like to check out.
By checking out a branch, you're effectively changing files on your hard drive so that they match what git has stored it's database for that branch. Don't panic if you notice that files you were working on in another branch are now gone - this is normal and part of how git branches work. If you check out your original branch, your files will re-appear.
When you're done working on a branch and you no longer wish to keep it around any more, you can remove that branch by using the
git branch command with a
-d and the branch name after it. The
-d signals to git that you'd like to remove that branch.
Be careful with this command as it's VERY difficult to recover data stored in a branch once that branch is deleted.
git branch -d <branch_name>
When you're ready to merge the changes you made in one branch to another you can use the
git merge command.
To use it, first make sure that all of your code is committed to your current branch. Remember, git works on commits rather than files so you need to have a commit in order to merge anything.
Next you'll checkout the branch that you want to merge those changes into. Typing
git merge <branchname> will merge the changes from
<branchname> into the branch you're currently on.
For example, if you have changes in
mytestbranch that you'd like to merge into
master. Starting in
mytestbranch, make sure you've committed all of your changes. Then type
git checkout master to move to the master branch. Now to merge the changes from your
mytestbranch into master type
git merge mytestbranch.
This will take the changes from
mytestbranch and merge them into the current branch (in this case, master).
If there are differences between the two branches that cannot be reconciled, you'll get a "merge conflict". See the section on merge conflicts for details on what those are, how they can occur, and how to fix them.
Git works by using it's database to determine what your files and folders should be. Then it modifies your local files and folders to match what it has in its database.
This can result in some very unsettling behavior. If you're working on a project in one branch and switch branches, you will see all of the changes you made disappear in those files. Don't Panic! Your changes are not gone - they're saved up in git's database of changes.
If you want the changes from a different branch to make it to your current branch (ie: if you want your files back), just use the following command to merge them in:
git merge <branch_name>
You'll see all of your changes magically re-appear.
Branches are one of the most powerful and versatile features of git. However, can also be difficult for beginners to wrap their heads around when and how they should use branches.
There are many different strategies for using branches in git and all of them are valid ways for using branches. What matters more than the strategy you chose is that your team agrees upon and understands the strategy chosen.
For a guide on some of the strategies recommended by the creators of git, check out the branching workflows section of the git documentation.
When you're using git, you'll most likely use the following 4 commands maybe 95% of the time, and maybe more. If you use these 4 commands, in the order mentioned here, you'll avoid most of the problems that beginner git users encounter.
The commands in order are as follows:
git add . git commit -m <and a message in quotes> git pull origin master git push origin master
When you're working in git, using these 4 commands in exactly this order will save you a lot of headache and heartache. In this section, I'll go through each one of these commands and explain what they are and how they relate to the other commands you'll use.
git add command automatically adds files to the "index" where git can track them. To add an entire folder, and all of its subfolders, use the following:
git add .
The "." at the end signifies the current working directory (CWD). This means start adding files to the index starting at the current directory and search through all of the child directories and start tracking any new files that were created.
git commit command is used to add a commit to the git database (repository) on the current branch. To add a commit with the latest changes to the git repository, use the following command:
git commit -a -m "Commit Message goes Here"
git commit itself is used to create the commit and the
-a flag ensures that all modified and removed files are tracked and accounted for.
-m flag allows you to create a message for the commit right on the command line. If you don't include the
-m command, many environments will open a text editor that allows you to add a message. Often times this text editor is an older one called "vim" that may be unfamiliar to you. To keep things simple, it's often best to just include the message on the command line.
When working with a remote repository (ie: on Github or another site), you'll want to synchronize your repository with the remote repository by getting and merging the changes from the remote into your local repository. To do this, use the following command:
git pull origin master
Git pull is a command that combines a
git fetch with a
git merge meaning that git will get all of the commit differences from the remote repository and merge them into your local repository in one step.
origin refers to the name of the remote repository and
master refers to the branch name.
You can pull from other repositories by changing
origin to the name of the remote and merge other branches by changing the name of
master to another branch. For example, to merge the
test branch of the
github remote into your local repository, use the following command:
git pull github test
NOTE: This command should ALWAYS be run AFTER you've committed your latest changes to your local repository. Doing a
git pull before you've committed your changes makes recovering from a failed merge very difficult.
Once you've pulled the changes from the remote repository, you'll want to update the remote repository with the newly merged source code from your local repository. To do this, use the following command:
git push origin master
git push indicates that the current repository changes should be sent to a remote repository. Since your
git pull command has already merged the changes from the remote repository with your local repository,
git push simply sends those new commits up to the remote repository.
If you attempt to do a
git push without first pulling down and merging the changes from the remote repository, you may receive a "fast-forward" error. This error occurs when you attempt to push code that hasn't been pulled / merged yet. Simply perform the
git pull before
git push and you shouldn't encounter these errors.
origin refers to the name of the remote repository and
master refers to the branch name.
You can push to other repositories by changing
origin to the name of the remote and push to another remote branch by changing the name of
master to another branch. For example, to push your local
test branch to the
test branch of a remote repository called
github, use the following command:
git push github test
You can also push your local
test branch to a remote
master branch by doing the following:
git push origin test:master
John O'Connor is an Entrepreneur, Engineer, and Educator that's built and led hundreds of products from Aerospace (Northrop Grumman and NASA) to Cyberspace (AT&T, Universal/NBC, ...) and started or helped others jumpstart over a dozen startups (CardBlanc, NapkinFinance, Spop, TAILS..) You could say that he's been around the block (or the planet)!
As an educator, John is highly regarded by his students and other faculty. He's taught in both Online settings (for Bloc, Inc, CodeMentor, and Thinkful) and as an Associate Professor of Computer Science at Norco College.
John has a B.S. in Computer Science from CSU San Bernardino, and is currently pursuing a M.S. in Computer Science at Georgia Tech (Go YellowJackets!).