Name: Palantir Foundry Pipelines and Dataset Bootcamp
Rating: 4.2 (31 reviews)

Udemy Business

Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Created byEmma Saunders

Last updated 12/2025

English

What you'll learn

Learn the Foundry Pipelines interface, core concepts, and workflow so you can navigate confidently and prepare to build your first pipeline.
Learn how to import, connect to, and upload raw data from multiple sources so you can reliably bring datasets into Foundry.
Learn to clean, add to, split, merge, test and validate data using Foundry transforms so you can build clear, maintainable pipelines.
Learn to publish datasets, set write modes, schedule builds, run tests, and prepare data for reliable downstream use.

Course content

6 sections • 28 lectures • 2h 12m total length

Introduction1:10
Get an overview of what this course covers, who its for, and how you'll learn to build and operate data pipelines using Foundry.

What are Foundry Pipelines?2:45
Learn what Foundry Pipelines are, how they fit into the wider Foundry ecosystem, and why they provide a powerful framework for transforming raw data into usable datasets.
Navigating Foundry Pipelines5:33
Explore the Pipeline Builder interface, including key panels, the graph view, and where to find inputs, outputs, transforms, and build information.
Creating an Empty Pipeline4:18
Learn how to navigate Foundry's Ontology structure, understand the difference between input and output data folders and walk through the creation of a new empty pipeline, choosing between Batch, Standard, Steaming, Spark, DataFusion and External processing options.
Section 2 Quiz

What this section will cover0:56
If Pipeline Builder is Foundry's Extract-Transform-Load offering, this section focuses on Extract. We will look at how to get data into the pipeline, starting with the easiest option, which is importing an existing Foundry dataset. Then we move on to connecting to an externally hosted dataset. While conceptually simple, we see the many ways in which such a connection can fail, and identify the easiest way to get public data into Foundry. Then we move on to private connections by trying to access password-protected data. We explain the difference between Authorization, Egress policies and Host keys, all of which are required by Foundry to bring external private data into Foundry.
Importing Data from within Foundry4:55
Learn how to import existing Foundry datasets into a Pipeline including the many types of supported data. Learn the difference between a Snapshot and Incremental computation.
Import Public Data into Foundry5:35
Foundry is not set up to handle public data, it tends to assume all data is private. In this lecture, learn the easiest ways of importing externally available public datasets into Foundry.
Import Private Data into Foundry4:19
We move on to private connections by trying to access password-protected data. We explain the difference between Authorization, Egress policies and Host keys, all of which are required by Foundry to bring external private data into Foundry.
Section 3 Quiz

What this section will cover0:59
Pipeline Builder is the Foundry's Extract-Transform-Load offering. In this section, we focus on arguably the most fun part - the transformations. Here we start off by converting JSON files to be tabular using built-in handlers. Then, once we have some data in our pipeline, we explore all the new commands available to us. We build increasingly complex transformations, starting off with splitting datasets and adding columns based on cell values; moving to chaining transformations, and ending by combining regular expression pattern matching with ANDs and ORs to build conditional logic. We also merge datasets in two ways with joins and unions. And at the end we see how incredibly time- and error- saving it can be to use a function: one function can easily replace scores of manually created transformation nodes.
Transformation Options and Handling JSON5:29
Learn how to parse and manipulate JSON fields inside your pipelines and explore the different transformation options available in Foundry. We use a snippet of JSON to let Foundry auto-generate the schema.
Making Sense of geoJSON4:56
In this lecture, we take a raw GeoJSON dataset and walk through the process of converting it into a clean tabular form inside Foundry. You'll see why using a full JSON snippet to generate a schema produces incorrect columns and how to fix it, by understanding the quirks of geoJSON.
Navigating Pipelines and Importing Functions3:42
Now that our pipeline contains data, we have a host of new commands open to us. This lecture explains those commands, including how to import a simple python Function from repos. Although we don't go into detail on functions, we learn why it is advisable to centralise some code across your organisation.
Creating Basic Transforms : Splits and Static Columns7:18
We use basic transforms to split our dataset based on a field, and to create new static and derived columns from values within the dataset.
Chaining Commands in a Single Transformation7:21
We learn how to chain multiple expressions together inside a single transformation to streamline your logic and keep your pipeline clean. We see that each step of this process creates a new column in the Preview.
Combining Transformations with Regex, Ands and Ors6:44
We start making more complex transformations by using pattern-matching techniques (regular expressions) and create conditional logic by combining AND/OR expressions.
Combining our Outputs to a Final Validity Column3:54
Until this point we have tested different aspects of designator validity: if there are numbers, are they in range? if there are letters, are they the allowed values? Now we combine those tests into one final validity column.
Exploring Joins and Unions7:39
We combine datasets using joins and unions, understanding that they are horizontal and vertical merges respectively. Also resolve mismatched schemas and learn the difference between left, right and inner joins.
Using Functions in Pipelines3:29
Saving the best until last! See how Foundry's built-in functions can massively simplify complex transformations. We create a single function in Python that replaces almost every node in our Pipeline, publish the repo and check the resulting data against our manual version.
Section 4 Quiz

What this section will cover1:03
Pipeline is Foundry's Extract-Transform-Load offering, and in this final key section, we explore Load. On the face of it, this means publishing the dataset, since Foundry automates the provisioning and partitioning that would have been done by teams of engineers previously. However, there are still a LOT of decisions you need to make. For example, do you want to schedule the pipeline to run regularly, or be driven by a trigger such as a data update in a source file? We explore how to set up unit tests and data validation checks, and see which results flow downstream into datasets, and which remain internal to Pipeline Builder. We see how to set access restrictions, and how to improve discoverability and usability of your dataset by colleagues, using metadata and markers, by looking at Datasets as well as Pipelines. Finally, we introduce the concept of branching, and explain how you and colleagues can collaborate on changes to the pipeline using a series of proposals and approvals. I think of the Load section as broadly administrative: not as fun as transformations but ultimately more important to get right.
Creating a Dataset with the Right Write Mode5:53
Configure and publish a pipeline output dataset and understand how choosing different write modes can radically alter the rows that are present in the final output. Learn that Foundry has automated many of the jobs traditionally performed by data engineers.
Error Handling and Data Quality6:34
Understand the five data expectations we can set on our data, and other validation tools such as unit tests that we can leverage to detect issues and stop incorrect data from publishing.
Using deployment schedules3:47
Learn that deployment can be scheduled regularly or be driven by data changes in the source files. See how to do both. Understand that a dataset must be both deployed and built before it can be scheduled. We also walk through the tabs you rarely need to use such as Build Settings.
Making sense of Unit tests7:33
Learn what a unit test is within Foundry and set one up to check our designator transformation. Understand where the test results flow to within Foundry.
Ensuring Usability of your Data via Datasets5:49
We learn the Dataset navigation with a focus on improving discoverability and usability of your data. We add descriptions and typeclasses for columns. Understand how this data can help AIP and directly benefit your colleagues.
Exploring Dataset-level Metadata7:41
We walk through the Dataset tabs, including the ability to compare datasets, and we end with the Details tab, which lets you add dataset-level metadata and descriptions. We compare the benefits of adding metadata via the interface vs. using JSON. We see these changes flow into the dataset summary.
Understanding Dataset Access and Security4:46
We learn that, by default, Foundry is restrictive with permissions, and you need quite powerful rights to relax these settings. Configure dataset permissions, understand role-based access, and apply data markings to protect sensitive information. See how to use the Check Access tab for a specific user.
Using Branches in Pipeline Builder7:01
We introduce the concept of branching, and create a branch in Pipeline Builder. We see how this flows into Dataset branching and understand the process for proposing and approving changes to Pipelines.
Section 5 Quiz

Requirements

No programming experience is needed. To get the most from this course, it would help if you have access to Foundry, so you can practise key concepts. You must be comfortable working with websites and data, such as spreadsheets.

Description

Building reliable, well-structured data pipelines is at the heart of working effectively in Palantir Foundry. In this hands-on course, you’ll learn how to take raw data from a variety of sources, transform it with confidence, and publish clean, trustworthy datasets that are ready for real-world use.

We begin by exploring the Foundry Pipelines interface so you can navigate with ease and understand how pipelines, datasets, and transforms fit together. You’ll learn how to import data from within Foundry, upload private files, and onboard publicly available datasets, giving you a solid foundation for working with different types of inputs.

From there, we dive into transforming your data. You’ll work with JSON and geoJSON, troubleshoot extraction issues, reshape data using splits and static columns, combine logic with Regex and conditional expressions, and enrich your pipelines with joins, unions, and built-in functions. Along the way, you’ll develop a strong mental model of how schemas evolve and how transforms chain together.

Next, you’ll learn how to load and operationalise your work. We cover write modes, scheduling, error handling, data expectations, and unit tests so you can ensure quality before publishing. You’ll also learn how to improve dataset usability by adding metadata, managing access, and applying best-practice governance.

Finally, we touch on branching to help you develop changes safely in a controlled environment.

By the end of this course, you’ll be able to confidently build, test, and publish complete data pipelines in Foundry, collaborating with colleagues along the way.

Who this course is for:

This course is designed for analysts, data engineers, and anyone working with Foundry who wants to learn how to build, transform, and publish data pipelines confidently.

What you'll learn

Explore related topics

Course content

Introduction1 lecture • 1min

Getting Started3 lectures • 13min

Extract4 lectures • 16min

Transform10 lectures • 52min

Load9 lectures • 50min

Conclusion1 lecture • 1min

Requirements

Description

Who this course is for: