Buying for a Team? Gift This Course
Wishlisted Wishlist

Please confirm that you want to add How to Tell a File's Format: Five Open Source Tools to your Wishlist.

Add to Wishlist

How to Tell a File's Format: Five Open Source Tools

A practical introduction to five software tools to identify file formats and extract metadata
5.0 (1 rating)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
21 students enrolled
Created by Gary McGath
Last updated 12/2015
$10 $25 60% off
2 days left at this price!
30-Day Money-Back Guarantee
  • 1.5 hours on-demand video
  • 8 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Have a coupon?
What Will I Learn?
install and use software tools to identify file formats and extract metadata.
View Curriculum
  • Students should be comfortable with downloading and installing software. Familiarity with the Linux/Unix command line is a great help.

You will learn how to use five free, open-source tools to identify the format, version, and profile of document files and obtain their metadata. If you're working in library and archive technology, or if you're a student preparing for this career, the course will give you a strong start in using those tools and understanding their strengths and weaknesses. The five central sections each cover one of these tools:

file: A command line tool included in Linux and Unix for simple file identification.

DROID: A batch-oriented tool from the UK National Archives, using the PRONOM format registry.

ExifTool: A metadata extraction tool that recognizes a broad range of formats.

JHOVE: Software developed at the Harvard University Library for careful validation of certain formats. I wrote most of the code for JHOVE.

Apache Tika: Content extraction software which can identify many formats.

For each tool, there's a discussion of how to use it followed by an on-screen demonstration of installing and using it, as well as a downloadable PDF summarizing the material.

You should be comfortable with installing software on your computer. Familiarity with the Unix/Linux command line is strongly recommended. Most, but not all, of the tools described can run on Windows. All will run on a Macintosh or Linux system.

Who is the target audience?
  • Students and professionals working with digital archives are the main audience for this course. Others whose work involves file identification, e.g., in digital forensics, may find value in it.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
Expand All 21 Lectures Collapse All 21 Lectures 01:29:28
3 Lectures 11:57

A general description of the course and the material it will cover.

Preview 03:55

A review of common Linux/Unix command lines for those who would like a refresher.

Unix/Linux command line
3 pages

Concepts of file format identification: Format descriptions, versions, profiles, and metadata. The relationship of the specification to real-life files. Tools with broad coverage vs. tools that examine a few formats in detail.

Basic concepts
2 Lectures 06:40

What file is, its strengths and weaknesses.

Preview 02:04

A demonstration, using screen capture, of how to use file. After completing this lecture, you should be able to use file on your own.

Using file
5 Lectures 18:58

What DROID is, its relationship to the PRONOM format registry, and how to download it.

Introduction to DROID

How to install DROID on a Unix/Linux/Mac system.

Installing DROID

How to create, add files to, and run profiles in the DROID GUI application.

Working with DROID profiles

How to export data and get reports from the DROID GUI application.

Getting information from a DROID profile

Running DROID from the command line
3 Lectures 14:04

What ExifTool is, what's required to run it, and what it's useful for.

Introduction to ExifTool

How to download and install ExifTool.

Installing ExifTool

How to use ExifTool to get information about files.

Running ExifTool
4 Lectures 22:21

What JHOVE is, a bit of its history, and a discussion of its strengths and weaknesses.

Preview 04:36

How to download JHOVE, install it, and launch the command line and GUI versions.

Downloading and installing JHOVE

Using JHOVE from the command line to identify and validate files, with some tricks for difficult file names.

Running JHOVE from the command line

How to use the JHOVE GUI application to identify and validate files, using all modules or selecting a module.

Running the JHOVE GUI application
Apache Tika
3 Lectures 10:10

What Tika is, its strengths and weaknesses, and why using it in server mode is the way to go.

Introduction to Tika

How to download Apache Tika, download it, and run the GUI application.

Installing Tika and running the GUI app

How to run Tika from the command line. A script is provided (see resource file) to make this easier.

Running Tika from the command line
1 Lecture 02:18

Review questions on the course.

Final quiz
5 questions

Review and recommendations for further study.

Closing remarks
About the Instructor
5.0 Average rating
2 Reviews
31 Students
3 Courses
Software Engineer

I'm an experienced software developer with a strong background in Java, library software, and digital preservation. For eight years I was a software engineer for the Harvard Library. I wrote the bulk of the code for JHOVE, a file identification and analysis tool widely used by libraries and archives. My written work includes the e-book Files that Last and the blog Mad File Format Science.

Report Abuse