How to Tell a File's Format: Five Open Source Tools
5.0 (1 rating)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
21 students enrolled
Wishlisted Wishlist

Please confirm that you want to add How to Tell a File's Format: Five Open Source Tools to your Wishlist.

Add to Wishlist

How to Tell a File's Format: Five Open Source Tools

A practical introduction to five software tools to identify file formats and extract metadata
5.0 (1 rating)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
21 students enrolled
Created by Gary McGath
Last updated 1/2016
English
Current price: $20 Original price: $25 Discount: 20% off
30-Day Money-Back Guarantee
Includes:
  • 1.5 hours on-demand video
  • 8 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • install and use software tools to identify file formats and extract metadata.
View Curriculum
Requirements
  • Students should be comfortable with downloading and installing software. Familiarity with the Linux/Unix command line is a great help.
Description

You will learn how to use five free, open-source tools to identify the format, version, and profile of document files and obtain their metadata. If you're working in library and archive technology, or if you're a student preparing for this career, the course will give you a strong start in using those tools and understanding their strengths and weaknesses. The five central sections each cover one of these tools:

file: A command line tool included in Linux and Unix for simple file identification.

DROID: A batch-oriented tool from the UK National Archives, using the PRONOM format registry.

ExifTool: A metadata extraction tool that recognizes a broad range of formats.

JHOVE: Software developed at the Harvard University Library for careful validation of certain formats. I wrote most of the code for JHOVE.

Apache Tika: Content extraction software which can identify many formats.

For each tool, there's a discussion of how to use it followed by an on-screen demonstration of installing and using it, as well as a downloadable PDF summarizing the material.

You should be comfortable with installing software on your computer. Familiarity with the Unix/Linux command line is strongly recommended. Most, but not all, of the tools described can run on Windows. All will run on a Macintosh or Linux system.

Who is the target audience?
  • Students and professionals working with digital archives are the main audience for this course. Others whose work involves file identification, e.g., in digital forensics, may find value in it.
Curriculum For This Course
21 Lectures
01:29:28
+
Introduction
3 Lectures 11:57

A general description of the course and the material it will cover.

Preview 03:55

A review of common Linux/Unix command lines for those who would like a refresher.

Unix/Linux command line
3 pages

Concepts of file format identification: Format descriptions, versions, profiles, and metadata. The relationship of the specification to real-life files. Tools with broad coverage vs. tools that examine a few formats in detail.

Basic concepts
08:02
+
file
2 Lectures 06:40

What file is, its strengths and weaknesses.

Preview 02:04

A demonstration, using screen capture, of how to use file. After completing this lecture, you should be able to use file on your own.

Using file
04:36
+
DROID
5 Lectures 18:58

What DROID is, its relationship to the PRONOM format registry, and how to download it.

Introduction to DROID
02:39

How to install DROID on a Unix/Linux/Mac system.

Installing DROID
02:06

How to create, add files to, and run profiles in the DROID GUI application.

Working with DROID profiles
03:51

How to export data and get reports from the DROID GUI application.

Getting information from a DROID profile
06:30

Running DROID from the command line
03:52
+
ExifTool
3 Lectures 14:04

What ExifTool is, what's required to run it, and what it's useful for.

Introduction to ExifTool
03:15

How to download and install ExifTool.

Installing ExifTool
05:52

How to use ExifTool to get information about files.

Running ExifTool
04:57
+
JHOVE
4 Lectures 22:21

What JHOVE is, a bit of its history, and a discussion of its strengths and weaknesses.

Preview 04:36

How to download JHOVE, install it, and launch the command line and GUI versions.

Downloading and installing JHOVE
04:23

Using JHOVE from the command line to identify and validate files, with some tricks for difficult file names.

Running JHOVE from the command line
07:57

How to use the JHOVE GUI application to identify and validate files, using all modules or selecting a module.

Running the JHOVE GUI application
05:25
+
Apache Tika
3 Lectures 10:10

What Tika is, its strengths and weaknesses, and why using it in server mode is the way to go.

Introduction to Tika
02:07

How to download Apache Tika, download it, and run the GUI application.

Installing Tika and running the GUI app
03:25

How to run Tika from the command line. A script is provided (see resource file) to make this easier.

Running Tika from the command line
04:38
+
Review
1 Lecture 02:18

Review questions on the course.

Final quiz
5 questions

Review and recommendations for further study.

Closing remarks
02:18
About the Instructor
Gary McGath
5.0 Average rating
3 Reviews
34 Students
3 Courses
Software Engineer

I'm an experienced software developer with a strong background in Java, library software, and digital preservation. For eight years I was a software engineer for the Harvard Library. I wrote the bulk of the code for JHOVE, a file identification and analysis tool widely used by libraries and archives. My written work includes the e-book Files that Last and the blog Mad File Format Science.