
Get an overview of what this course covers, who its for, and how you'll learn to build and operate data pipelines using Foundry.
Learn what Foundry Pipelines are, how they fit into the wider Foundry ecosystem, and why they provide a powerful framework for transforming raw data into usable datasets.
Explore the Pipeline Builder interface, including key panels, the graph view, and where to find inputs, outputs, transforms, and build information.
Learn how to navigate Foundry's Ontology structure, understand the difference between input and output data folders and walk through the creation of a new empty pipeline, choosing between Batch, Standard, Steaming, Spark, DataFusion and External processing options.
If Pipeline Builder is Foundry's Extract-Transform-Load offering, this section focuses on Extract. We will look at how to get data into the pipeline, starting with the easiest option, which is importing an existing Foundry dataset. Then we move on to connecting to an externally hosted dataset. While conceptually simple, we see the many ways in which such a connection can fail, and identify the easiest way to get public data into Foundry. Then we move on to private connections by trying to access password-protected data. We explain the difference between Authorization, Egress policies and Host keys, all of which are required by Foundry to bring external private data into Foundry.
Learn how to import existing Foundry datasets into a Pipeline including the many types of supported data. Learn the difference between a Snapshot and Incremental computation.
Foundry is not set up to handle public data, it tends to assume all data is private. In this lecture, learn the easiest ways of importing externally available public datasets into Foundry.
We move on to private connections by trying to access password-protected data. We explain the difference between Authorization, Egress policies and Host keys, all of which are required by Foundry to bring external private data into Foundry.
Pipeline Builder is the Foundry's Extract-Transform-Load offering. In this section, we focus on arguably the most fun part - the transformations. Here we start off by converting JSON files to be tabular using built-in handlers. Then, once we have some data in our pipeline, we explore all the new commands available to us. We build increasingly complex transformations, starting off with splitting datasets and adding columns based on cell values; moving to chaining transformations, and ending by combining regular expression pattern matching with ANDs and ORs to build conditional logic. We also merge datasets in two ways with joins and unions. And at the end we see how incredibly time- and error- saving it can be to use a function: one function can easily replace scores of manually created transformation nodes.
Learn how to parse and manipulate JSON fields inside your pipelines and explore the different transformation options available in Foundry. We use a snippet of JSON to let Foundry auto-generate the schema.
In this lecture, we take a raw GeoJSON dataset and walk through the process of converting it into a clean tabular form inside Foundry. You'll see why using a full JSON snippet to generate a schema produces incorrect columns and how to fix it, by understanding the quirks of geoJSON.
Now that our pipeline contains data, we have a host of new commands open to us. This lecture explains those commands, including how to import a simple python Function from repos. Although we don't go into detail on functions, we learn why it is advisable to centralise some code across your organisation.
We use basic transforms to split our dataset based on a field, and to create new static and derived columns from values within the dataset.
We learn how to chain multiple expressions together inside a single transformation to streamline your logic and keep your pipeline clean. We see that each step of this process creates a new column in the Preview.
We start making more complex transformations by using pattern-matching techniques (regular expressions) and create conditional logic by combining AND/OR expressions.
Until this point we have tested different aspects of designator validity: if there are numbers, are they in range? if there are letters, are they the allowed values? Now we combine those tests into one final validity column.
We combine datasets using joins and unions, understanding that they are horizontal and vertical merges respectively. Also resolve mismatched schemas and learn the difference between left, right and inner joins.
Saving the best until last! See how Foundry's built-in functions can massively simplify complex transformations. We create a single function in Python that replaces almost every node in our Pipeline, publish the repo and check the resulting data against our manual version.
Pipeline is Foundry's Extract-Transform-Load offering, and in this final key section, we explore Load. On the face of it, this means publishing the dataset, since Foundry automates the provisioning and partitioning that would have been done by teams of engineers previously. However, there are still a LOT of decisions you need to make. For example, do you want to schedule the pipeline to run regularly, or be driven by a trigger such as a data update in a source file? We explore how to set up unit tests and data validation checks, and see which results flow downstream into datasets, and which remain internal to Pipeline Builder. We see how to set access restrictions, and how to improve discoverability and usability of your dataset by colleagues, using metadata and markers, by looking at Datasets as well as Pipelines. Finally, we introduce the concept of branching, and explain how you and colleagues can collaborate on changes to the pipeline using a series of proposals and approvals. I think of the Load section as broadly administrative: not as fun as transformations but ultimately more important to get right.
Configure and publish a pipeline output dataset and understand how choosing different write modes can radically alter the rows that are present in the final output. Learn that Foundry has automated many of the jobs traditionally performed by data engineers.
Understand the five data expectations we can set on our data, and other validation tools such as unit tests that we can leverage to detect issues and stop incorrect data from publishing.
Learn that deployment can be scheduled regularly or be driven by data changes in the source files. See how to do both. Understand that a dataset must be both deployed and built before it can be scheduled. We also walk through the tabs you rarely need to use such as Build Settings.
Learn what a unit test is within Foundry and set one up to check our designator transformation. Understand where the test results flow to within Foundry.
We learn the Dataset navigation with a focus on improving discoverability and usability of your data. We add descriptions and typeclasses for columns. Understand how this data can help AIP and directly benefit your colleagues.
We walk through the Dataset tabs, including the ability to compare datasets, and we end with the Details tab, which lets you add dataset-level metadata and descriptions. We compare the benefits of adding metadata via the interface vs. using JSON. We see these changes flow into the dataset summary.
We learn that, by default, Foundry is restrictive with permissions, and you need quite powerful rights to relax these settings. Configure dataset permissions, understand role-based access, and apply data markings to protect sensitive information. See how to use the Check Access tab for a specific user.
We introduce the concept of branching, and create a branch in Pipeline Builder. We see how this flows into Dataset branching and understand the process for proposing and approving changes to Pipelines.
We summarise what we have learnt and suggest next steps.
Building reliable, well-structured data pipelines is at the heart of working effectively in Palantir Foundry. In this hands-on course, you’ll learn how to take raw data from a variety of sources, transform it with confidence, and publish clean, trustworthy datasets that are ready for real-world use.
We begin by exploring the Foundry Pipelines interface so you can navigate with ease and understand how pipelines, datasets, and transforms fit together. You’ll learn how to import data from within Foundry, upload private files, and onboard publicly available datasets, giving you a solid foundation for working with different types of inputs.
From there, we dive into transforming your data. You’ll work with JSON and geoJSON, troubleshoot extraction issues, reshape data using splits and static columns, combine logic with Regex and conditional expressions, and enrich your pipelines with joins, unions, and built-in functions. Along the way, you’ll develop a strong mental model of how schemas evolve and how transforms chain together.
Next, you’ll learn how to load and operationalise your work. We cover write modes, scheduling, error handling, data expectations, and unit tests so you can ensure quality before publishing. You’ll also learn how to improve dataset usability by adding metadata, managing access, and applying best-practice governance.
Finally, we touch on branching to help you develop changes safely in a controlled environment.
By the end of this course, you’ll be able to confidently build, test, and publish complete data pipelines in Foundry, collaborating with colleagues along the way.