Importance of Cardinality
A free video tutorial from Prathap G
Technical Enterprise Architect
4.4 instructor rating • 3 courses • 11,086 students
Learn more from the full courseOracle SQL : Mastering Oracle SQL Performance Tuning
Oracle SQL Query Tuning and Architecture | Solve critical Oracle SQL query performance issues | Advanced SQL Tuning
06:26:10 of on-demand video • Updated October 2019
- Step by Step approach on troubleshooting badly performing SQL queries.
- Identify how internal Oracle components affects the performance of the SQL query.
- Identify potential performance bottlenecks by reading the Explain plan of the queries.
- Tips and Tricks to make the Oracle SQL query run faster.
- Compare BAD vs GOOD Queries with Examples.
- Understand how STATISTICS and CARDINALITY affects the performance of the SQL Query.
- Implications of JOINS and ACCESS Methods on the performance of the Query.
- Oracle Performance Tuning on both Oracle11g and Oracle12c Environment.
- Advanced SQL Query Tuning Steps.
English [Auto] Hey guys. Welcome back. In the previous lesson we have gone through Oracle statistics in detail. Now let's look at another important concept called cardinality which is an important component of the optimizer. So statistics and cardinality could be considered as the backbone of optimizer since we now know what is statistics. Let us understand cardinality in detail. Let me give you the same example that I gave you while explaining the statistics I assume that you have to go to the office from your home in one hour and you see that there is a quite a bit of traffic on the way. So how can you reach the office in one night. You need to have all the necessary information regarding the our level roads alternate routes traffic and on the roads and other relevant information that would help us to pick the right road to reach office within one day information that helps you to come up with the road is called statistics and the selection of the part. Using the statistics to select a route is called cardinality. So definition of cardinality is the estimation of the number of records retrieved to perform a specific operations within a query plan. It could be a full table access estimate index can estimate join estimate or any other estimate depending on the estimated value the optimizer would assign the cost of each operation and then aggregate the cost of all the operations in the query to come up with a total cost of the query the optimizer would choose the execution plan of the query with the lowest cost. So if the cardinality goes wrong on any of the operation that it has the potential to increase or decrease the overall cost of the greatly and affects the execution plan of the query which may even lead to performance issues. So we now know how important cardinality is. So the estimation of the number of regards our cardinality is purely a mathematical model and you will always get the same cardinality. As long as the input parameters for the mathematical model doesn't change but the mathematical model is part simple. Oracle has always played their best to build a mathematical model that works well most of the times on most of the machine on most of the conditions. Important thing to note here is that it is most of the times and not always with that background. Let me show fine examples on how the Oracle comes up with the cardinality and will also check the mathematical formula that it uses in those scenarios. In this case I would be using assets schema from Oracle to will see environment first. Let me copy all the quote is from the notepad. Could a sequel develop a session. Let me first create sales and it's called backup table by selecting all the data from the sales table. Let me execute the query that is now execute the first scenario. In this case I will run a query projecting my card count from the sales and the scope to backup. Let us execute the query we see that it return 9 1 8 8 4 2 in the cards. So this table has almost close to 1 million records and it does not check the execution plan of the query we see here. That cardinality of the estimated rows is many hundred and eighteen cards and also the actual gross return. But the query is also nine hundred and eighteen records we see here that the cardinality generated by the optimizer is perfect. Now let's do reverse engineering and see how it generates the cardinality using mathematical formula. Let us first check the number of record counts from the statistics table. Let's execute this query we see here that the value is 9 1 8 8 4 3. So the cardinality estimated value for the card is total rose to estimated ratio. In this case we have not selected a subset of the data from the table. There doesn't come any requirement to use that ratio variable. Let's use the value one here and let's execute the query. So this is exactly the cardinality consultation used for the query and it matches with the actual gross so the calculation used by the optimizer is total those multiplied by the estimated ratio. In this case estimated ratio values when we get a cardinality value as 9 1 8 8 4 3. Let's just second scenario. Now let us use a filter condition in this case I use custody as the condition I would be running the same could even three different values. Let's see what is the actual output generated by the query. And also the cardinality estimated by the optimizer in all the three scenarios. Let's run the query with custodial value as 1 6 6 0 we see that it returns 60 of the cards no let us check the explained plan to see the estimated because we see that the optimizer estimated the value as one the instead of 60. Please remember of the estimated value and it is 120 in our case let us do the second query with custodial value as 987 we see that in return 1 8 records. Now let us gently explain plan to see the estimated records we see that the optimizer estimated the value as one instead of 180. We see here that the estimated value remains the same as that of the old query which is 130 let us get the third query. With custody. Value as 1 9 4 8 FEC that it returned to different records. Now let us check the explained plan to see the estimated records we again see here that optimizer estimated the value as 120 instead of 245. So in all the three cases with three different granules the optimize has estimated the value as one that the LeT not check the math formula that was used by the optimizer to come up with this number. Since we're losing custody Carlo let me check the column Statistics data. If you look at the query I am actually selecting the distinct values for the custody from the table. And I'm also generating the density value of using the simple calc 1 divided by a number of distinct values. Let me execute equally let's copy the density value and use it as a ratio in the cardinality formula let's run the query we see here that it returned 130. This is exactly what was used as the estimated number of records by the optimizer in all the three cases the calculation used in this case is total rows multiplied by the density of the filter. Pretty good column since the filter. Pretty good column we use do not is his custom or I.D. it calculates the density of the column using the column statistics that were gathered as part of statistics generation. Let's test third scenario. Now let us use to fill good condition. In this case I use customer and channel lady as a filter condition. I would be running the same query with three different values and let's see what would be the actual output generated with a query and and we'll compare the cardinality estimated by the optimism. Let's run the first query with custodial value as 1 6 6 0 and translated as fight we see here that it returns 0 records. Now let us check the explain. Glad to see the estimated records we see that the optimizer estimated the value as 33 instead of 0. Please remember the estimated random number in this case the value is 3 it does now run the second query with Castile in line eighty seven and channel in value as for receipt that it returns it or records it does no check the explained plan of the query we see that again the optimizer estimated the value as 33 instead of 0 so the estimated value remained the same. Let us run the third query with custodial value as 1 9 4 8 and translated as 3 this time it returned 188 records. Now let us check the explained plan to see the estimated card count we see that optimizer estimated the value as 33 instead of 188. So again the estimated value would remain the same in all the three cases. So one thing which is very important is that irrespective of the customer value and channel 80 value the optimizer always estimated the data is 33 and just checked the mathematical formula that was used by the optimizer to come up with this number. Seems we are losing customer loyalty column. Let me check the column statistics data. In this query I will select a number of distinct values and also will generate the density data using the calc 1 divided by a number of distinct values. Let me execute the query let us copy this tentative value and we'll be using this value later as part of cardinality formula since we are also using channel any column. Let me check the identity value from the column Statistics data. Let's run this query I know get the value as zero point to five let me copy both identity values and use it in the calculation once copied. Let us run the query we see that the value returned is 33. This is exactly what was used as the estimated number of records. The calculation used in this case is total number of rows multiplied by the density of the first predicate column and then multiplied with density of the second filter the predicate column. Let's test fourth scenario for that. Let me create a new table called my index card table underscore 1 by selecting the data from all underscore objects. I am also populating two new columns namely column 1 and Column 2 Column 1 contains the raw number information and column 2 contains the negative of one on one. Let us know select the data from the newly created table we see in that column 1 and column 2 are always mutually exclusive. And adding the data between those two columns will always result in a value of zero. Important thing to note here is that it is only V who knows that adding both the columns will always result in a value of zero whereas Oracle does not have this information because the stats will be generated Oracle generates the stats for each individual column and it doesn't maintain the relationship between two different columns. Let me know generate the stats on the newly created table let us check if the stats up to date had or not. Let me check user underscore tab and its quote statistics table. We see that the number of rows in the table has plus taxes seventy seven thousand seven hundred and twenty eight. Let us know check column statistics we see here that column statistics along with his two grams are generated for this table. Let me execute the first query note here I am adding column 1 and column 2 and dividing with 2 and checking if the data is good at 100 basically what I am trying to do is creating a complicated arithmetic operation which with two columns. In this case Oracle finds it difficult to predict the estimated rows. We see that it returns zero records not only does check the explained plan to see the estimated records we see that optimizer estimated the value as three a date 6 instead of 0. Please remember the estimated value number in this case 3 8 2 ticks let me execute the second query here. I'm adding column 1 and column 2 and instead of dividing it with 2 and multiplying it for 2 and checking if the data is good or 100 again we see that it to zero records. Let us check the explain plan of the quarry we see that optimism estimated the value of 3 8 8 6 0 0. So again we see here that estimate records is the same. Let me execute a third query here I'm adding column 1 and column 2 and multiplying it 2 and checking if the data is greater than 50000 again. We see here that data are returned to zero. Let us know check the explain line of the query we see that optimize an estimated value as three a page six instead of zero. So again the estimated number of records is the same in all the three cases. So how did Oracle come up with the number three eighty six in this case before that. Let us run this query we see that total number of records to the table is 7 7 7 2 8 and 5 percent our third is 3 8 8 6 and this is our estimate a number. So in this case Oracle simply used 5 percent as the estimated value. So in this case where there was a quite a complicated arithmetic explanation Oracle was not able to come up with a good estimate a number based on the formula. So it hardcoded the estimated value as 5%. Let's test fifth scenario in this case. I will be using inequality operator in the particular condition. So in this case I would be using greater than operator on Channel 8 economy before running the actual queries. Let me run some queries and still the result of failed examples because this information is really helpful for us to come up with a good mathematical formula that is used by the optimizer it does first check the number of distinct challenges that do exist to the table we see here that there are four distinct values namely two three four and nine. So this value or someplace. Let us know check the minimum and maximum value we see here that minimum value is two and the maximum value is nine. Still this value or someplace that does it under query the challenge is greater than 5 we see that it returns two thousand seventy four records let us know check the explained plan of the we see that optimizer has estimated the value as 525 k instead of two thousand seventy four we see that data is actually quite a difference between the actual and estimated values and this is definitely not desirable. So how did optimize came up with this number before that. Let me ask you this query we see here that the output of the query is 525 K this is exactly the same value as for optimizer estimated. Let us look at the farm line data. If you see the formula number of rows multiplied by high value in this predicate value divided by high value minus low value high value. In this case is the highest value that distinctly. Since we had four value to come at three or four common nine the higher value is nine. So whenever there is high value replace that with nine and the predicate value is the actual selected predicate that is being used in the query since I used to query that channel 80 greater than 5 by predicate value in this case is 5. And in the low value is the lowest value that do exist with the SEALs and the score back up to the lowest channel any value is to replace the low value as to so on replacing all the values that the number that received is the actual estimate a number that optimizer used. So as you can see here right four different combinations optimizer users different mathematical formulas the mathematical formula used by operator or optimizer in this case is actually a bit complicated. So guys I've shown you five different cases which are simple enough and we have already seen that it is quite a bit of difference in the estimated actual values. As I've already told you that the mathematical formula works good for most of the cases and for most of the scenarios and may not work on all the scenarios are uncommon cases. That is if the query that is being executed is very complex involving LOC giants and lot of multiple conditions along with mathematical expressions it would become very difficult for the optimizer to come up with a good cardinality and relate to performance issues guys from this lesson. We now know how important statistics and cardinality are to come up with a good execution plan. So next time whenever you see a performance issue with a query the first thing that you need to do is to check the statistics and cardinality information. With that let me wrap this lesson and see you in the next lesson.