janvier
2011
D’habitude pour faire une estimation de la taille d’une base de données j’utilise l’outil POWER AMC. En fonction de différents paramètres : nombre de lignes dans chaque table, index, … POWER AMC indique une estimation de la taille de la base. Mais la méthode de calcul n’est pas affichée par POWER AMC …
En fouillant sur le net, je suis tombé sur une procédure écrite par Tim Laqua pour faire une estimation de la taille d’une table; la procédure stockée de Tim Laqua est basée sur un mode de calcul clair proposé par Microsoft qui est ici
Dans ce billet, nous allons comparer les résultats donnés par la procédure stockée de Tim Laqua et l’outil POWER AMC 9.5
Voici la DDL de la table de test (cette table est tiré du livre SQL Server 2008 Internals – page 223)
Pour des raisons de simplicité, nous n’allons pas poser de clé primaire ni d’index sur cette table.
CREATE TABLE customer
(
cust_id smallint NULL,
cust_name varchar(50) NULL,
cust_addr1 varchar(50) NULL,
cust_addr2 varchar(50) NULL,
cust_city varchar(50) NULL,
cust_state char(2) NULL,
cust_postal_code varchar(10) NULL,
cust_phone varchar(20) NULL,
cust_fax varchar(20) NULL,
cust_email varchar(30) NULL,
cust_web_url varchar(100) NULL
)
Avec 1000 lignes dans la table voici le résultat fournit par POWER AMC 9.5
–POWER AMC 9.5
Estimation de la taille de la base de donnes "ModeleConceptuelDonnees_1"…
Nombre Taille estime Objet
————————- ———————– —————————————————-
1.000 405 Ko Table "customer"
————————- ———————– —————————————————-
405 Ko Espace total estimé
Nous allons maintenant utiliser la procédure de Tim Laqua. cette procédure est détaillée ici
–> La table de définition des types : les types Geography et Geometry ne sont pas pris en compte
CREATE TYPE TableColumnDefinition AS TABLE
([DataType] VARCHAR(16) NOT NULL
,[Length] INT
,[IsKey] BIT
,[DataLength] AS
CASE
WHEN [DataType] = 'BIT' THEN 0.125
WHEN [DataType] = 'TINYINT' THEN 1
WHEN [DataType] = 'SMALLINT' THEN 2
WHEN [DataType] IN ('INT', 'SMALLMONEY', 'REAL', 'SMALLDATETIME') THEN 4
WHEN [DataType] IN ('BIGINT', 'DATETIME', 'MONEY') THEN 8
WHEN [DataType] = 'UNIQUEIDENTIFIER' THEN 16
WHEN [DataType] = 'NVARCHAR' THEN [Length] * 2 + 2
WHEN [DataType] = 'VARCHAR' THEN [Length] + 2 -- DOESN'T SUPPORT MAX
WHEN [Datatype] = 'DATE' THEN 3
WHEN [Datatype] = 'TIME' THEN 5 -- Worst case
WHEN [Datatype] = 'DATETIME2' THEN 8 -- Worst case
WHEN [Datatype] = 'DATETIMEOFFSET' THEN 10 -- Worst case
WHEN [DataType] = 'CHAR' THEN [Length]
WHEN [DataType] = 'NCHAR' THEN [Length] * 2 + 2
WHEN [DataType] = 'BINARY' THEN [Length]
WHEN [DataType] = 'VARBINARY' THEN [Length] + 2
WHEN [Datatype] = 'TIMESTAMP' THEN 8
WHEN [DataType] = 'XML' THEN [Length] -- Estimate an average size in bytes, 2GB max
WHEN [DataType] IN ('DECIMAL', 'NUMERIC') THEN
CASE
WHEN [Length] BETWEEN 1 AND 9 THEN 5
WHEN [Length] BETWEEN 10 AND 19 THEN 9
WHEN [Length] BETWEEN 20 AND 28 THEN 13
WHEN [Length] BETWEEN 29 AND 38 THEN 17
END
WHEN [DataType] = 'FLOAT' THEN
CASE
WHEN [Length] BETWEEN 1 AND 24 THEN 4
WHEN [Length] BETWEEN 25 AND 53 THEN 8
WHEN [Length] IS NULL THEN 8 -- Default n is 53
END
ELSE 0
END
,[IsVariableLength] AS
CASE
WHEN [DataType] IN ('VARCHAR', 'NVARCHAR', 'VARBINARY') THEN 1
ELSE 0
END)
–> La procédure stockée de calcul
/*************************************************************************************
***
*** Procedure: [Utility_EstimateClusteredIndexSize]
*** Purpose: Estimates the amount of space needed for a given clustered index
*** Methodology: http://msdn.microsoft.com/en-us/library/ms178085.aspx
***
***
*** Author: Tim Laqua
*** Date Created: 2009-10-15
***
*** Revision History
*** Date Author Description
*** 2009-10-15 tl Created
*************************************************************************************/
ALTER PROCEDURE [dbo].[Utility_EstimateClusteredIndexSize]
(
@Num_Rows FLOAT
,@ClusteredIndexIsUnique BIT
,@Fill_Factor FLOAT
,@TableColumn TableColumnDefinition READONLY
)
AS
SET NOCOUNT ON
DECLARE
@Num_Cols FLOAT = 0 -- total number of columns (fixed-length and variable-length)
,@Fixed_Data_Size FLOAT = 0 -- total byte size of all fixed-length columns
,@Num_Variable_Cols FLOAT = 0 -- number of variable-length columns
,@Max_Var_Size FLOAT = 0 -- maximum byte size of all variable-length columns
,@Variable_Data_Size FLOAT = 0
DECLARE @VAR TABLE
([Name] VARCHAR(64)
,[VALUE] FLOAT
,[DisplayValue] AS
CASE
WHEN LOWER([Name]) LIKE '%size%' OR LOWER([Name]) LIKE '%space%' THEN
CASE
WHEN [VALUE] < 1024 THEN CAST(CAST(ROUND([VALUE], 3) AS NUMERIC(16,0)) AS VARCHAR(16)) + ' B'
WHEN [VALUE] < 1024*1024 THEN CAST(CAST(ROUND([VALUE]/1024, 3) AS NUMERIC(16,3)) AS VARCHAR(16)) + ' KB'
WHEN [VALUE] < 1024*1024*1024 THEN CAST(CAST(ROUND([VALUE]/1024/1024, 3) AS NUMERIC(16,3)) AS VARCHAR(16)) + ' MB'
ELSE CAST(CAST(ROUND([VALUE]/1024/1024/1024, 3) AS NUMERIC(16,3)) AS VARCHAR(16)) + ' GB'
END
ELSE CAST(FLOOR([VALUE]) AS VARCHAR(16))
END)
-- Leaf level data
SELECT
@Num_Variable_Cols = COUNT(1)
,@Max_Var_Size = ISNULL(SUM([DataLength]), 0)
FROM @TableColumn WHERE [IsVariableLength] = 1
SELECT
@Num_Cols = COUNT(1) + @Num_Variable_Cols
,@Fixed_Data_Size = CEILING(ISNULL(SUM([DataLength]), 0))
FROM @TableColumn WHERE [IsVariableLength] = 0
INSERT INTO @VAR VALUES ('@Fixed_Data_Size', @Fixed_Data_Size)
IF @ClusteredIndexIsUnique <> 0
BEGIN
SET @Num_Cols += 1
IF @Num_Variable_Cols > 0
SET @Num_Variable_Cols += 1
IF @Max_Var_Size > 0
SET @Max_Var_Size += 4
END
DECLARE @Null_Bitmap INT = CEILING(2 + ((@Num_Cols + 7) / 8))
INSERT INTO @VAR VALUES ('@Null_Bitmap', @Null_Bitmap)
-- if there are variable columns
IF @Num_Variable_Cols > 0
SET @Variable_Data_Size = 2 + (@Num_Variable_Cols * 2) + @Max_Var_Size
INSERT INTO @VAR VALUES ('@Num_Cols', @Num_Cols)
INSERT INTO @VAR VALUES ('@Num_Variable_Cols', @Num_Variable_Cols)
INSERT INTO @VAR VALUES ('@Max_Var_Size', @Max_Var_Size)
DECLARE @Row_Size FLOAT = CEILING(@Fixed_Data_Size + @Variable_Data_Size + @Null_Bitmap + 4 )
INSERT INTO @VAR VALUES ('@Row_Size', @Row_Size)
DECLARE @Rows_Per_Page FLOAT = FLOOR(8096 / (@Row_Size + 2))
INSERT INTO @VAR VALUES ('@Rows_Per_Page', @Rows_Per_Page)
DECLARE @Free_Rows_Per_Page FLOAT = FLOOR(8096 * ((100 - @Fill_Factor) / 100) / (@Row_Size + 2) )
INSERT INTO @VAR VALUES ('@Free_Rows_Per_Page', @Free_Rows_Per_Page)
DECLARE @Num_Leaf_Pages FLOAT = CEILING(@Num_Rows / (@Rows_Per_Page - @Free_Rows_Per_Page) )
INSERT INTO @VAR VALUES ('@Num_Leaf_Pages', @Num_Leaf_Pages)
DECLARE @Leaf_space_used FLOAT = 8192 * @Num_Leaf_Pages
INSERT INTO @VAR VALUES ('@Leaf_space_used', @Leaf_space_used)
-- Index Information
DECLARE
@Num_Key_Cols FLOAT = 0 -- total number of key columns (fixed-length and variable-length)
,@Fixed_Key_Size FLOAT = 0 -- total byte size of all fixed-length key columns
,@Num_Variable_Key_Cols FLOAT = 0 -- number of variable-length key columns
,@Max_Var_Key_Size FLOAT = 0 -- maximum byte size of all variable-length key columns
SELECT
@Num_Variable_Key_Cols = COUNT(1)
,@Max_Var_Key_Size = ISNULL(SUM([DataLength]), 0)
FROM @TableColumn
WHERE
[IsVariableLength] = 1
AND [IsKey] = 1
SELECT
@Num_Key_Cols = COUNT(1) + @Num_Variable_Key_Cols
,@Fixed_Key_Size = ISNULL(SUM([DataLength]), 0)
FROM @TableColumn
WHERE
[IsVariableLength] = 0
AND [IsKey] = 1
IF @ClusteredIndexIsUnique <> 0
BEGIN
SET @Num_Key_Cols += 1
IF @Num_Variable_Key_Cols > 0
SET @Num_Variable_Key_Cols += 1
IF @Max_Var_Key_Size > 0
SET @Max_Var_Key_Size += 4
END
INSERT INTO @VAR VALUES ('@Num_Variable_Key_Cols', @Num_Variable_Key_Cols)
INSERT INTO @VAR VALUES ('@Max_Var_Key_Size', @Max_Var_Key_Size)
INSERT INTO @VAR VALUES ('@Num_Key_Cols', @Num_Key_Cols)
-- Who would seriously have a nullable key column? ridiculous.
DECLARE @Index_Null_Bitmap FLOAT = 2 + ((@Num_Key_Cols + 7) / 8)
INSERT INTO @VAR VALUES ('@Index_Null_Bitmap', @Index_Null_Bitmap)
DECLARE @Variable_Key_Size FLOAT = 0
IF @Max_Var_Key_Size > 0
SET @Variable_Key_Size = 2 + (@Num_Variable_Key_Cols * 2) + @Max_Var_Key_Size
INSERT INTO @VAR VALUES ('@Variable_Key_Size', @Variable_Key_Size)
DECLARE @Index_Row_Size FLOAT = @Fixed_Key_Size + @Variable_Key_Size + @Index_Null_Bitmap + 1 + 6
INSERT INTO @VAR VALUES ('@Index_Row_Size',@Index_Row_Size )
DECLARE @Index_Rows_Per_Page FLOAT = FLOOR(8096 / (@Index_Row_Size + 2))
INSERT INTO @VAR VALUES ('@Index_Rows_Per_Page', @Index_Rows_Per_Page)
DECLARE @Non_Leaf_Levels FLOAT = 1
-- Log base @Non_Leaf_Levels hack
WHILE (@Num_Leaf_Pages / @Index_Rows_Per_Page) > POWER(@Index_Rows_Per_Page,@Non_Leaf_Levels)
SET @Non_Leaf_Levels += 1
SET @Non_Leaf_Levels += 1
INSERT INTO @VAR VALUES ('@Non_Leaf_Levels', @Non_Leaf_Levels)
-- Summation hack
DECLARE
@Num_Index_Pages FLOAT = 0
,@Summand INT = @Non_Leaf_Levels
WHILE @Summand > 0
BEGIN
SET @Num_Index_Pages += CEILING(@Num_Leaf_Pages/POWER(@Index_Rows_Per_Page, @Summand))
SET @Summand -= 1
END
INSERT INTO @VAR VALUES ('@Num_Index_Pages', @Num_Index_Pages)
DECLARE @Index_Space_Used FLOAT = 8192 * @Num_Index_Pages
INSERT INTO @VAR VALUES ('@Index_Space_Used', @Index_Space_Used)
DECLARE @Clustered_Index_Size FLOAT = @Leaf_Space_Used + @Index_Space_used
INSERT INTO @VAR VALUES ('@Clustered_Index_Size', @Clustered_Index_Size)
SELECT * FROM @VAR
–> Test de la procédure en utilisant la définition (DDL) de la table customer
DECLARE @TableColumn TableColumnDefinition
INSERT INTO @TableColumn
VALUES
('smallint',NULL,0)
,('VARCHAR',50,0)
,('VARCHAR',50,0)
,('VARCHAR',50,0)
,('VARCHAR',50,0)
,('CHAR',2,0)
,('VARCHAR',10,0)
,('VARCHAR',20,0)
,('VARCHAR',20,0)
,('VARCHAR',30,0)
,('VARCHAR',100,0)
EXEC [dbo].[Utility_EstimateClusteredIndexSize]
@Num_Rows = 1 – number of rows in the table
,@ClusteredIndexIsUnique = 0 – is it?
,@Fill_Factor = 90 – page fill factor
,@TableColumn = @TableColumn – definition of table columns
Résultat
——————————
@Row_Size 431 octets
@Rows_Per_Page 18
Pour les 1000 lignes, l'estimation de la taille de la base = 431 octets X 1000 = 421 Ko
–> Que peut-on dire ?
POWER AMC estime la taille de la base à 405 Ko et la procédure basée sur la méthodologie Microsoft affiche 421 Ko. à mon humble avis ces résultats sont proches.
Qu’en pensez-vous ?
————————————————————————-
Auteur : Etienne ZINZINDOHOUE
————————————————————————-