Informatica Performance Tuning and Optimization Techniques
- 3.5 hours on-demand video
- 3 downloadable resources
- Full lifetime access
- Access on mobile and TV
- Certificate of Completion
Get your team access to 4,000+ top Udemy courses anytime, anywhere.Try Udemy for Business
- Push Down Optimization.
- Partitions in Informatica.
- Dynamic Session Partitions.
- Concurrent Workflow Execution.
- Load Balancing for Workflow Distribution on GRID.
- Steps to determine the best way to improve performance of the existing Informatica Objects.
- Iterative methods of identifying one performance bottleneck at a time and eliminate it.
- Optimization techniquies for Flat File Targets.
- Optimization techniquies for Flat File Sources.
- Optimization techniquies for Mappings.
- Optimization techniquies for Datatype Conversions.
- Optimization techniquies for Expression Transformation.
- Optimization techniquies for Aggregator Transformation.
- Optimization techniquies for Custom Transformation.
- Optimization techniquies for Joiner Transformation.
- Optimization techniquies for Lookup Transformation.
- Optimization techniquies for Normalizer Transformation.
- Optimization techniquies for Sequence Generator Transformation.
- Optimization techniquies for Sorter Transformation.
- Optimization techniquies for Source Qualifier Transformation.
- Optimization techniquies for SQL Transformation.
- Optimization techniquies for the Sessions.
- Optimization techniquies for Grid Deployment.
- Optimization techniquies for the (Operating Systems) Systems.
- Optimizing Buffer Block Sizes.
- Optimizing Cache Sizes for Cache using transformations.
- You should know Informatica Power Center
- ETL Methodologies using Informatica Power Center
The purpose of this course is to walk you through the steps to identify performance bottlenecks in Informatica Power Center and different optimization techniques to fix them. After completing this course, you will have a working understanding of the different performance bottlenecks, how to identify them based on the iterative approach and the thread statistics. Different best practices and optimization techniques which should be consideration while designing the mapping or fine tuning the existing ones.
- ETL Developers
- ETL Testers/QA Analysts
- Informtaica Administrators
- UNIX Administrators managing the Data Warehouse Enviornments
- Relational DBAs managing the Data Warehouse Enviornments
- Windows Administrators managing the Data Warehouse Enviornments
What should be the goal of any Performance Tuning exercise? is discussed in this lecture.
If the volume of data grows or you add more CPUs, session run time increases. In dynamic partitioning, Integration Service determines the number of partitions to create at run time based on factors such as source database partitions or the number of nodes in a grid.
Types of Dynamic Partitioning:
1. Based on Source Partitioning - Determines the number of partitions using database partition information. The number of partitions is the maximum of the number of partitions at the source.
2. Based on number of CPUs - Sets the number of partitions equal to the number of CPUs on the node that prepares the session. If the session is configured to run on a grid, dynamic partitioning sets the number of partitions equal to the number of CPUs on the node that prepares the session multiplied by the number of nodes in the grid.
3. Based on number of Nodes in Grid - Sets the partitions to the number of nodes in the grid running the session. If you configure this option for sessions that do not run on a grid, the session runs in one partition and logs a message in the session log.
4. Based on No. of Partitions - Sets the partitions to a number that you define in the Number of Partitions attribute. Use the $DynamicPartitionCount session parameter, or enter a number greater than 1.
Pushdown optimization technique in inIormatica pushes the part or complete transformation logic to the source or target database. The integration service translates the transformation logic into SQL queries and sends those SQL statements to the database. Then the source or target database runs those SQL queries to process the transformation logic.
The amount of transformation logic you can push to the database depends on the database, transformation logic, mapping and session configuration. The Integration Service processes all transformation logic that it cannot push to a database.
To preview the SQL statements and mapping logic that the integration service pushes to the source or target database, use the pushdown optimization viewer.
The Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the source or the target database which executes the SQL queries to process the transformations. The amount of transformation logic one can push to the database depends on the database, transformation logic, and mapping and session configuration. The Integration Service analyzes the transformation logic it can push to the database and executes the SQL statement generated against the source or target tables, and it processes any transformation logic that it cannot push to the database.
There are three types of pushdown optimization techniques that you can configure in a session. They are:
- Source-side Pushdown Optimization
- Target-side Pushdown Optimization
- Full Pushdown Optimization
Source-Side Pushdown Optimization:
The Integration Service first analyzes the mapping from the source to the target or until it reaches a downstream transformation that it cannot push to the source database. Then the integration service generates the SELECT statement and executes the SQL in the source database. Then it reads the result rows and process the remaining transformations.
Target-Side Pushdown Optimization:
The Integration Service first analyzes the mapping from the target to the source or until it reaches an upstream transformation that it cannot push to the target database. Then the integration service generates the Insert, Update or Delete statements. The integration service processes the transformation logic up to the point that it can push the transformation logic to the database. Then, it executes the generated SQLs on the target database.
Full Pushdown Optimization:
You can use full pushdown optimization only when the source and target databases are in the same relational database management system.
The Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the target database. If the Integration Service cannot push all transformation logic to the target database, it tries to push all transformation logic to the source database.
If it cannot push all transformation logic to the source or target, the Integration Service pushes as much transformation logic to the source database, processes intermediate transformations that it cannot push to any database, and then pushes the remaining transformation logic to the target database. The Integration Service generates and executes an INSERT SELECT, DELETE, or UPDATE statement for each database to which it pushes transformation logic.
- Only use SQL Overrides if there is a substantial performance gain or complexity decrease. SQL Overrides need to be maintained manually and any changes to the data structure will result in rewriting or modifying the SQL Override.
- Do use the WHERE condition and SORTED ports in the Source Qualifier if possible, rather than adding a filter or a sorter transformation.
- Delete unused ports / only connect what is used. Reducing the number of records used throughout the mapping provides better performance by minimizing the amount of data moved.
- Tune source qualifier queries to return only the data you need.
- Perform large lookups in the Source Qualifier instead of through a traditional lookup.
- When applicable, generate the default SQL in the Source Qualifier and use the ‘Validate’ option to verify that the resulting SQL is valid.
- Calculate once, use many times. Avoid calculating or testing the same value over and over. Calculate it once in an expression, and set a true/false flag. Within an expression, use variables to calculate a value used several times.
- Create an anchor expression transformation that will map the source table to an intermediary transformation using the source column names. Do simple processes (LTRIM/RTRIM, string/numeric conversions, testing for NULL, etc.) in this transformation. This will enable an easier transition if the source table changes in the future.
- Watch your data types. The engine will automatically convert compatible types. Sometimes conversion is excessive and happens on every transformation which slows the mapping.
- Expression names should begin with "EXP" followed by descriptive words
- Do not propagate ports out of an Expression transformation if they are not used in the mapping going forward.
- Group input-outputs ports first, followed by variable ports and then by output ports. Incorrectly ordering the ports in an Expression transformation can lead to errors and/or inaccurate results.
- Avoid using the $Source and $Target variables in the Lookup Connection Information. Connection names have been set up to be generic across Production, Test. If possible, set the Connection Information in the Lookup transformation to one of these non-level specific connection names.
- Set the connections in the session for ease of migration.
- Do not include any more ports in the Lookup other than necessary. Reducing the amount of data processed provides better performance.
- Avoid date time comparisons in lookup; replace with string.
- Not all sources and targets treat strings with leading or trailing blanks the same. It may be necessary to RTRIM and LTRIM string data prior to using it in a Lookup.
- Lookups on small tables (<10,000) records can be cached and use ODBC. Lookups on large tables should be cached As a general practice, do not use uncached lookups.
- In place of lookups, tables can be joined in the source qualifier. However, often necessitates left joins, which can complicate source qualifiers. Weigh performance vs ease of maintenance when deciding between source qualifiers and lookups.
- Do not use Aggregators for simple sorting; use the sorter transformation or the SORTED ports option of the Source Qualifier.
- Minimize aggregate function calls by using “group by”.
- Do place Aggregators as early in the mapping as possible, as they reduce the number of records being processed, thereby improving performance.
- Wherever possible, sort incoming data to an Aggregated use the ‘Sorted input’ option to improve performance.
- Place Filters as early in the mapping as possible, as they reduce the number of records being processed, thereby improving performance.
- Use a Filter to screen rows that would be rejected by an update strategy. (Rejected rows from an update strategy are logged to the bad file, decreasing performance.)
- If you have Aggregator transformation in mapping, use filter before aggregation to avoid unnecessary aggregation.
- Routers may not be the best choice if load order of the target(s) is important since it is not possible to control the load order of the legs from a router.
- The target load method(s) must be carefully chosen when using routers, especially if the data is loading to the same target, in order to avoid table locks and ensure that the data is loaded in the correct order.
- Whenever possible, perform joins in the database i.e. the source qualifier itself.
- Whenever possible, sort incoming data to a Joiner transformation and use the ‘Sorted input’ option to improve performance.
- To improve performance of an unsorted Joiner transformation, designate the source with fewer s as the ‘Master’.
- Do not codes update strategies when all rows to the target insert.
- DO include an update strategy when all rows to the target are update, unless a proof of concept shows that there is performance degradation. This adds clarity to the mapping for future developers.
- Rejected rows from an update strategy are logged to the bad file. Consider filtering if retaining these rows isn’t critical due to the performance hit caused by logging.
- Avoid loading to the same target between different data flows where possible.