Search this Blog

Friday, May 17, 2019

Calculating Table Sizes Oracle

In this post I will discuss the growth of tables, and the ways to calculate the space used.
For the newbies among us, an Oracle databases exists of tablespaces, the defined space consisting of datafiles on disk, that hold the databases objects, like tables and indexes.

When a table grows, a tablespace can become full. The origin of this document is from one of these occassions where a tablespace is/was full and I was trying to free up space.

But first some theory:

How do tables grow in Oracle?

Well that's an easy question, one would think, but depending on the way a table is used, the table can either grow slowly, grow very fast, or keep approximately the same size.
  • When, in the last example, a table consists of very volatile data, so many inserts and deletes occur during a certain period of time, the database will use the space freed by the deletes for the new rows.
  • When there are more rows added than there are deleted, a table will grow, as new rows are added at the end of the table, behind the so-called high-water-mark of the table.
  • When an application uses a 'direct-insert' method, mostly seen in bulk-loads, the data is inserted after the high water mark anyway, so the database does in that case not look for free space within the existing occupied space.

That about how a table can grow, and of course this covers not all aspects that are involved.
We will address more dependencies, as we discuss the table's actual size.

First we need make a small sidestep and discuss "statistics".

Oracle database statistics contain information about your data. Statistics information consists e.g. of the number of rows in a table, the average length of a row, the spreading of the data in the different columns and more. Via this link, you can find "Oracle Database Statistics for Dummies".

Now that we now, we can find the number of rows in a table on another way than running a "select count(*)..", then we are able to calculate the size of the data within the table.
With that we can make a seperation between the actual physical size and the used size of a table.

Imagine this bowl being a table. When we have calculated statistics on this bowl, we know that there are -approx.- 100 rows (peas) in this table, with an average row-size of 1 pea.

The physical size of the bowl can be compared with the physical size of the table. This physical size consists of a 100 rows (peas) and a lot of free space, in this case air.

If we wouldn't have table statistics, we would not know the occupied size of the table, but only the physical size. In that case we would not be able to tell anything of the tables usage.

However when we do have statistics, we can tell more about the table's actual size. Oracle has options to shrink a table back to its used size, in order to free up space in the database but also to increase performance of the table, especially with inserts. If a table has a lot of free space, most of the time very fragmented, then the database possibly needs to spend lots of time to find a suitable empty spot in the table, where a row that is inserted would fit.
If we shrink the table to its used size, the database does not need to spent much time on looking for a free place and put the row at the end of the table. This is of course valid for so long, as there will be done new deletes again, which free up usable space again.

Now imagine this situation of a huge table, having a lot of free space in it. The below query shows us that the table INVOICES only has 49% of actual used space:

select ds.owner 
     , ds.segment_name tabname
     , ds.bytes
     , dt.num_rows
     , dt.avg_row_len
     , dt.num_rows * dt.avg_row_len usedsize_bytes
     , round(dt.num_rows * dt.avg_row_len / ds.bytes * 100,1) "%used"
from   dba_segments ds
     , dba_tables dt
where      dt.tablespace_name = 'DATA'
       and ds.tablespace_name = dt.tablespace_name
       and dt.owner = ds.owner
       and dt.table_name = ds.segment_name;

Result:
OWNER   TABNAME         BYTES   NUM_ROWS AVG_ROW_LEN USEDSIZE_BYTES      %used
------- -------- ------------ ---------- ----------- -------------- ----------
FINANCE INVOICES   9768534016    1204707        3976     4789915032         49


Now we can use the 'alter table shrink space' command to reduce the physical size of the table. 

SQL> alter table FINANCE.INVOICES enable row movement;
SQL> alter table FINANCE.INVOICES shrink space; 

This will take some time, depending on the size of a table, but it will shrink the table to its smallest possible size. That is, when everything goes as expected. In our case it didn't. When running the above query again, the numbers now showed:

OWNER   TABNAME         BYTES   NUM_ROWS AVG_ROW_LEN USEDSIZE_BYTES      %used
------- -------- ------------ ---------- ----------- -------------- ----------
FINANCE INVOICES   9753395200    1204707        3976     4789915032         49

The table's physical size decreased just a little, but not as much as expected.
So what causes this behavior, what extra factors can/must be taken into account and why can this table not be brought down to its actual data size ?


The table's definition here is causing the problem itself. The table is setup as following:

CREATE TABLE FINANCE.INVOICES
(
  INVOICE_ID    NUMBER(10)            NOT NULL,
  CUSTNO        NUMBER(8)             NOT NULL,
  INVOICE_DESC  VARCHAR2(4000 CHAR)   NOT NULL,
  INV_AMOUNT    NUMBER(10)
)
TABLESPACE DATA
PCTUSED    0
PCTFREE    10
INITRANS   1
;

With a database blocksize of 8192, it can easily be determined that only one row of the table fits in a block. The actual available blocksize for row-storage is 8192 - 10% ( pctfree ) = 7372 bytes
The average row length of the table is 3972, the maximum row_length is approx. 4028 bytes (not taking into account UTF encoding or whatever)

So, it is easily calculated that every block of this table only holds one row, and no more. This is confirmed by looking at the number of rows in the table compared to the number of blocks the table occupies:

SQL> select sum(blocks) from user_Extents where segment_name = 'INVOICES'
2>   union all
3>   select num_rows from user_Tables where table_name = 'INVOICES';

       CNT
----------
   1190600
   1206840

So this table holds approximately 8192 - 3972 = 4220 bytes of free space in each block it allocates.
That is 3400 bytes of free row space +  10% PCTFREE space = 820 bytes !

We can double check if this is correct, by doing the following math:

physical table size / ( db block size * average_row_length) = 4729063200

This value (almost) matches the value of the USEDSIZE in the above table being 4789915032.



Can we optimize these tables in any way at all??

Well, in our case for every row the database needs to read from disk, a whole block needs to read.
Put it the other way around, we only get one row when we read one block from disk.

If we would create a seperate tablespace for this table with 16kB blocksize, then it would make a difference. With still a PCTFREE of 10% we then would be able to store 3 rows in a single block, a 33% decrease of occupied space!!

A small test showed us that with the current setting (8k blocksize), a test table filled with 69000 rows occupies 68608 blocks  ( approx. 536Mb )

If we recreate the same table in a 16k blocksize tablespace, then the same table only occupies 23040 blocks ( approx. 360Mb )
This saved 33% of physical diskspace as expected.


Conclusion:

Table growth is not only about rows in a table. Tables that are known for frequent data changes, ( inserts/deletes and updates) can benefit from a shrink every now and then.
However, if this does not bring the expected result as seen above, then the DBA or Data Architects can look at other solutions like different blocksize or maybe a change in the structure of a table.