Wednesday, 16 January 2008

Oracle Disk io subsystem performance

Oracle server disk subsystem
Disk performance for Oracle servers
Clancy Bufton
• Senior DBA
• TMD Computing
• Tasmania, Australia
clancy.bufton@dpac.tas.gov.au


Some existing installations
• Single disk
• Sliced into partitions
• UFS


Solaris UFS
• Block based Inode allocation
• Block size 8kByte
• file buffer cache
• “read ahead” and “write behind”

Solaris LVM
• Logical Volume Manager
• Mirroring
• Striping
• Concatenation
• RAID 1+0
• Software RAID 5 don’t do it


RAID levels
• RAID 0
– Stripe data across multiple disks
– No redundancy
– Decreases Mean Time To Data Loss MTTDL
– Loss of 1 disk destroys entire volume
– Higher IO throughput reads and writes
• RAID 1
– Mirrors a disk
– 2 way or 3 way
– Increases MTTDL
– Potential higher read throughput write stays the same
• RAID 0+1
– A mirrored stripe
– Loss of a single disk in one mirror destroys the entire mirror
• RAID 1+0
– A striped mirror
– Loss of a single disk only destroys mirroring of the partner disk
• RAID 4
– Stripe with parity disk
– Provides single disk failure protection
– Reads fast, writes slow as the parity drive
• RAID 5
– Stripe with striped parity
– Provide single disk failure protection, improves writes because parity is also striped, Reads are fast
• RAID 6
– Striped disk with two disk failure protection

Oracle ASM
• Integrated volume manager and database file system
• Eliminates need for Solaris LVM and UFS
• Automatically stripes and mirrors database files
• Continually optimises data file extent placement to eliminate hotspots
• Eases storage administration tasks


Disk IO
• What size is the IO ?
• How many IOs required per second ?
• IO Characteristic ?
– Sequential
– Random
• Concurrency ?
– To what degree ?

Storage hardware
• Disks
– How many
– Rotational speed ?
– Average seek time ?
– internal transfer rate ?
• Disk controllers
– How many ?
– How many channels ?
– Channel bandwidth ?
– Protocol ?
• Raid controllers
– What raid levels ?
– How many host ports ?

Solaris IO options
• Asynchronous IO
– Don’t block the process to wait for IO completion
– Issue all IO requests immediately
– Available since Solaris 6
• Direct IO
– Bypass the buffer cache
– Disables read ahead and write behind
– Simulates raw device performance


Disk IO tests
• Sequential
– Synch, buffer cache, 8KB IO size
– Asynch, buffer cache, 8KB IO size
– Asynch, direct, 8KB IO size
– Repeat but vary IO size (16KB, 128KB, etc)
– Test both reads and writes
• Random
– Synch, buffer cache, 8KB IO size
– Asynch, buffer cahce, 8KB IO size
– Asynch, direct, 8KB IO size
– Repeat but vary IO size (16KB, 128KB, etc)
– Test both reads and writes

Test cases
• Single disk
• 3 wide stripe 1M interlace
• Reads and writes

Test C code – Asynch Random Writes

/*
* wrbench
*
* SYNOPSIS A simple write-random benchmark application.
* USAGE wrbench [-d device] [-s device-size] [-b io-size]
* EXAMPLE wrbench -d /dev/rdsk/c1t6d0s0 -b 512 -s 9000000000
* (For a Sun 9GB disk, 512 byte IO size)
*
* Written by G. D. Musumeci for O'Reilly &
* Associates. This source code is released under the terms of
* the BSD License.
adapted for asynch and direct io by Clancy J Bufton
*/
#include
#include
#include
#include
#include
#include
int
main(int argc, char **argv)
{
extern char *optarg;
int c, r, FD;
char *device, *buf;
uint64_t deviceSize = 0;
int ioSize = 512, iter = 512;
hrtime_t st, ft;
long double t;
off64_t s, e;
aio_result_t aiores;
aio_result_t *restwo;
int directIO = 0;
/* Parse command-line options */
while ((c = getopt(argc, argv, "qd:s:b:i:")) != -1) {
switch(c) {
case 'd':
device = (char *)strdup(optarg);
break;
case 's':
deviceSize = atoll(optarg);
break;
case 'b':
ioSize = atoi(optarg);
break;
case 'i':
iter = atoi(optarg);
break;
case 'q':
directIO = 1;
break;
case '?':
if (isprint(optopt)) {
fprintf(stderr, "Unknown option `-%c'.\n", getopt);
} else {
fprintf(stderr, "Unknown option character `\\s%xc'.\n",
optopt);
}
return 1;
}
}
/* Quick sanity check */
if (device == NULL) {
fprintf(stderr, "wrbench: must specify a device to test with -d\n");
exit(1);
}
fprintf(stderr, "wrbench: Testing %s (%lld bytes) with %d-byte writes\n", device, deviceSize, ioSize);
/* Allocate space for *buf */
buf = (char *)malloc(ioSize * sizeof(char));
/* Seed the random number generator */
st = gethrtime();
srand48((long) st);
/* Open the device */
if ((FD = open(device, O_RDWR O_LARGEFILE )) < st =" gethrtime();" c =" 0;" s =" (((deviceSize" e =" lseek64(FD," restwo =" (aio_result_t" r =" aiowrite(FD," restwo =" aiowait(NULL))">aio_return == (ioSize * sizeof(char))) {
free(restwo);
}
else
{
perror("aio");
exit(0);
}

}
/* Stop timing */
ft = gethrtime();
directio(FD,DIRECTIO_OFF);
close(FD);
t = ((long double)ft - (long double)st) / 1000000000;
/* Output the result */
fprintf(stderr, "wrbench: %0.Lf %d-byte IOs/sec\n",
(iter / t), ioSize);
}





Test C code
• Remove lseek64 call for sequential test
• Use aiowrite, aioread for async
• directio(FD,DIRECTIO_ON) controls buffer cache on / off
• Change write to read for read test
• Original code adapted for direct io and asynch test
compile with e.g, gcc -laio seqread.c



Disk Test Results – Direct IO
• Random synchronous writes
– 3 wide 1M stripe IO/sec
• 8k = 39, 128k = 25
– Single disk IO/sec
• 8k = 35, 128k = 23
• Random asynchronous writes
– 3 wide 1M strip IO/sec
• 8k = 233, 128k = 109
– Single disk IO/sec
• 8k = 39, 128k = 25


Disk test results – async direct







Oracle IO options
• Initialisation parameter
• FILESYSTEMIO_OPTIONS =
– { none setall directIO asynch }
– Reboot server after setting this parameter
• Enable direct and asynchronous with SETALL
– In some situations directio isn’t enabled
– Workaround force directio on solaris mount point
– mount -o forcedirectio /dev/md/dsk/d1 /testfs
• Reclaim the freed memory and allocate it to the Oracle buffer cache

Determining the Oracle IO size ?
• The oracle block size 8192
• DB_FILE_MULTIBLOCK_READ_COUNT
– Controls the maximum read IO size oracle will perform
– Example: 16 x 8k = 128K
– Used for full table scans
– Test results shows fewer large IO sustains higher throughput
• Oracle IO size
– Datafiles primary 8k but up to 8k * db_file_multiblock_read_count for full table scans
– Redo logs must be forced direct
• On ufs 8k max io size, slow
• On direct ufs 60k io size, much faster

Oracle IO tests
• create table test as select * from dba_objects
• Check redo log file throughput
• Check data file read and write throughput
• Insert into test select * from test where rownum <>• First alter system checkpoint;

• Next run the insert

• Then rollback the insert so that all buffers are loaded


• Rerun the insert and note the time, commit the transaction


• Alter system checkpoint to flush the freshly dirtied buffers and note the time

Oracle test results



•Test case 1 result due to throughput attainable on redo logs
•Test case 2 result due to throughput attainable on data files





SAN Considerations • How to estimate the IO size ? – Empirical evidence required – Supports selection of an appropriate stripe depth – Oracle recommendation for SAME is 1Mbyte stripe depth • Gauge the number of IO required per second at peak times • Because the disks are connected to multiple hosts IO requests will vary • The RAID controller slices up IOs – Striping hides the IO size from the disk because disks only see IO size equal to the stripe depth – How many IOs will be split ? • How many host ports available ? • Bandwidth of host ports ?

Tuesday, 15 January 2008

Date Dimension generator Oracle PL/SQL

This PL/SQL procedure populates a date dimension table for use in dimensionally modelled data warehousing

create or replace procedure date_dim()as declare
start_year NUMBER;end_year NUMBER;last_key NUMBER;
the_date DATE;
begin
start_year := 2008;end_year := 2009; the_date := to_date('01 Jan 'start_year,'dd mon yyyy');
select max(date_key) into last_key from sla_report_presentation.date_dimension;
while (extract(year from the_date) <= end_year) LOOP last_key := last_key + 1; --DBMS_OUTPUT.PUT_LINE(to_char(the_date,'DAY d ddd dd mm MONTH yyyy')); insert into sla_report_presentation.date_dimension values (last_key ,to_char(the_date,'YYYY') ,to_char(the_date,'MONTH') ,to_char(the_date,'DD') ,to_char(the_date,'DAY') ,case when to_char(the_date,'DAY') in ('Saturday','Sunday') then 'Weekend' else 'Weekday' end ,the_date ,to_char(the_date,'DDD') ,to_char(the_date,'MM') ); the_date := the_date + interval '1' day; end loop;
commit;
end;

Sunday, 6 January 2008

MS SQL Server Cursors

API Server cursors

The downside of server side cursors is that each row has to be fetched one by one, meaning a round trip from the client to the server for each row.

The advantage of server side cursors is that the entire rowset doesn’t need to be shipped over the network when the query is executed. However, this is only useful if the client typically only needs a handful of rows at a time, slowly iterating through the rowset, perhaps based on user input, for example an input form with “next” and “previous” buttons.

If the entire result set needs to be retrieved for example to build a report, it is very inefficient to use server side cursors.


Basic cursor test

As a test the following query is run, once as a normal transact-sql statement and again as a cursor, using the fetch command.

select * from "APPT6" with ( NOLOCK) Where ("APPT6"."bAuditAppt" = 0) and ("APPT6"."room" = 'Surgery 12_2') and ("DELETED" = 0) Order By "APPT6"."bAuditAppt" ASC,"APPT6"."room" ASC,"APPT6"."dtFollowup" ASC,"APPT6"."RecordNum" ASC

Running this query results in 10521 rows being retrieved in 2 seconds.
Execution consumed 281 milliseconds of CPU time.

Now

DECLARE test_cursor CURSOR
FOR
select * from "APPT6" with ( NOLOCK) Where ("APPT6"."bAuditAppt" = 0) and ("APPT6"."room" = 'Surgery 12_2') and ("DELETED" = 0) Order By "APPT6"."bAuditAppt" ASC,"APPT6"."room" ASC,"APPT6"."dtFollowup" ASC,"APPT6"."RecordNum" ASC


OPEN test_cursor

FETCH NEXT FROM test_cursor

WHILE @@FETCH_STATUS = 0
BEGIN

FETCH NEXT FROM test_cursor

END

close test_cursor

deallocate test_cursor

Running the while loop to fetch the same 10521 rows takes 17 seconds.
Execution with the cursor consumed 1110 milliseconds of CPU time


Simulation test

Cursor performance penalties can be clearly shown using a test application. The test app is written in c# and performs a simple select query that is present in the TITANIUM application. The query is run using two different methods, with cursors and without cursors.

The application spawns 50 threads and each thread executes the same query 20 times with a 125 millisecond pause between each execution. The queries are run against the TITANIUM_TEST database. The SQL server instance has virtually zero load on it, so the CPU usage can be clearly seen.

Running with cursors the application produces around 50 cursor requests per second, equivalent to the average cursor request rate on production. The SQLSVR process %CPU Time can be seen to roughly follow the cursor request rate and matches the production CPU Time. Also overall CPU time can be seen to average around 30-40%.

Running without cursors the exact same query has a much lower %CPU Time and executes in less than half the time.





With Cursors, notice the SLQSVR (Yellow) requires a scale of 400% because of the quad processor host.









Without Cursors





Simulation code

C# Test code


public static void fast_query()
{


OleDbConnection nwindConn =
new OleDbConnection("Provider=SQLOLEDB;Data Source=10.8.199.88;Password=whatf0r;User ID=sa;Initial Catalog=TITANIUM_TEST");



OleDbCommand catCMD = nwindConn.CreateCommand();
catCMD.CommandText = "select * from \"CHART5\" with ( NOLOCK) Where (\"CHART5\".\"ridPatient\" = '0000022500:00019:00005') and (\"DELETED\" = 0) Order By \"CHART5\".\"ridPatient\" ASC,\"CHART5\".\"wUniquecode\" ASC,\"CHART5\".\"RecordNum\" ASC";

nwindConn.Open();


for(int i = 0; i < 20; i++)
{


OleDbDataReader myReader = catCMD.ExecuteReader();





while (myReader.Read())
{
string ls = myReader.GetString(0);
//Console.WriteLine("\t{0}", myReader.GetString(0));
}




myReader.Close();

Thread.Sleep(150);

}
nwindConn.Close();


}









public static void slow_query()
{

ADODB.Connection cn = new ADODB.Connection();
cn.Open("Provider=SQLOLEDB;Data Source=10.8.199.88;Password=whatf0r;User ID=sa;Initial Catalog=TITANIUM_TEST", "sa", "whatf0r", 0);

ADODB.Recordset rs = new ADODB.Recordset();
rs.CursorLocation = ADODB.CursorLocationEnum.adUseServer;
rs.CursorType = ADODB.CursorTypeEnum.adOpenDynamic;
rs.LockType = ADODB.LockTypeEnum.adLockBatchOptimistic;

//"select * from \"APPT6\" with ( NOLOCK) Where (\"APPT6\".\"bAuditAppt\" = 0) and (\"APPT6\".\"room\" = 'Surgery 12_2') and (\"DELETED\" = 0) Order By \"APPT6\".\"bAuditAppt\" ASC,\"APPT6\".\"room\" ASC,\"APPT6\".\"dtFollowup\" ASC,\"APPT6\".\"RecordNum\" ASC"

for (int i = 0; i < 20; i++)
{
rs.Open("select * from \"CHART5\" with ( NOLOCK) Where (\"CHART5\".\"ridPatient\" = '0000022500:00019:00005') and (\"DELETED\" = 0) Order By \"CHART5\".\"ridPatient\" ASC,\"CHART5\".\"wUniquecode\" ASC,\"CHART5\".\"RecordNum\" ASC", cn, ADODB.CursorTypeEnum.adOpenStatic, ADODB.LockTypeEnum.adLockBatchOptimistic, 0);
//rs.ActiveConnection = null;


bool finished = false;
while (!finished)
{
rs.MoveNext();

if (rs.EOF)
{
finished = true;
rs.MoveLast();

}
else
{


string ls = (string)rs.Fields[0].Value;
//Console.WriteLine(rs.Fields[0].Value);
}
}

rs.Close();

Thread.Sleep(150);
}
cn.Close();



}




Production Trace

The SQL Server trace also shows that cursor fetches are more prevalent that any other statement. The observable cursors are API cursors that result from ADO, ODBC or OLE DB server side cursors. No block fetches can be seen in the trace and this indicates that a fetch is being executed for each row in the result set.

http://msdn2.microsoft.com/en-gb/library/aa172588(SQL.80).aspx

As described in the above article API server cursors can be identified in the trace by the sp_cursoropen and sp_cursorfetch procedures being present.

Below is an excerpt from a profiler trace on the TITANIUMDENTAL user. One of the common queries can be seen. It is being requested as an API server cursor. A long list of fetch calls can be seen, the fourth parameter to the fetch call shows the number of rows being fetched which is 1 row per fetch.

The trace indicates that the duration from cursor open to the last fetch call is just under half a second. This provides adequate performance for the client; however the server sustains a CPU overhead for processing so many fetch calls in quick succession. This can be observed by watching the % CPU time by sql server threads.

In real time it can be seen that each thread spikes to around 30-40% CPU Time momentarily as each session executes its cursor fetches. Cumulatively it is witnessed as an average utilisation of about 30-40%