In this post, I will discuss about the methods to store LOBs in an Oracle database – Basic files and secure files.
you may need to store information in a non-stuctured or semi-structured form. Examples include pictures, word processing documents, spreadsheets, XML files, and so on. How are these types of data stored?
HOW TO STORE NON-STRUCTURES OR SEMI-SRUCTURED INFO IN DATABASE?
There are usually two approaches: The data is stored in the database as LOB fields (BLOB for binary and CLOB for character data), or in OS files with the references to the files stored in the database.
- BLOB: BLOBs contain binary information that is not subject to character set conversion. This would be an appropriate type to store a spreadsheet, a word processing document, image files, and the like.
- CLOB: CLOBs contain information that is subject to character set conversion. This would be an appropriate type to store large amounts of textual information, such as XML or just plain text.
- BFILE: This datatype permits you to store an Oracle directory object (a pointer to an operating system directory) and a file name in a database column and to read this file. This effectively allows you to access operating system files available on the database server in a read-only fashion, as if they were stored in the database table itself. The file is maintained outside of the database and is not really part of the database at all. A BFILE provides read-only access to the contents of the file.
Each approach has advantages and challenges. OS files can be cached by the OS and journaled filesystems that expedite recovery after crashes. They also generally consume less space than the data in the database since they can be compressed.
There are also tools that can intelligently identify patterns in the files and remove duplication for a more efficient storage; however, they are external to the database so the properties of the database do not apply to them. These files are not backed up, fine grained security does not apply to them, and such files are not part of a transaction–so concepts so innate to the Oracle database like read consistency do not apply.
What if you could get the best of both worlds? In Oracle Database 11g, you have the answer with SecureFiles, a completely new infrastructure inside the database that gives you the best features of database-resident LOBs and OS files. Let’s see how.
Traditionally, priot to Oracle Database 11g, you would have defined the table as follows:
conn / as sysdba
create user tst identified by tst;
grant connect, resource to tst;
create table basic_lob
( id number,
lob (txt) store as
enable storage in row
The various parameters indicate :
– the tablespace where the lobsegment and lobindex will be stored, and this may be different from the tablespace where the table itself resides.
– IN ROW Clause
. disable storage in row : LOB data is always stored separately from the table, in the lobsegment
. enable storage in row(default) : LOBs of up to 4,000 bytes will be stored in the table itself LOBs exceeding 4,000 bytes will be moved out of line into the lobsegment
– CHUNK SIZE : LOBs are stored in chunks; the index that points to the LOB data points to individual chunks of data. Chunks are logically contiguous sets of blocks and are the smallest unit of allocation for LOBs.The CHUNK size must be an integer multiple of your Oracle blocksize. Each LOB instance will consume at least one CHUNK.
If you pick a CHUNK size that is more than expected LOB sizes, you could end up wasting an excessive amount of space. For example, if you have that table with 7KB LOBs on average, and you use a CHUNK size of 32k, you will be wasting approx. 25k of space per LOB instance. On the other hand, if you use an 8KB CHUNK, you will minimize any sort of waste.
If chunk size is smaller than expected LOB sizes, it will affect your retrieval performance. A lobindex used to point to the individual chunks, and the more chunks you have, the larger this index is. If you have a 4MB LOB and use an 8KB CHUNK, you will need at least 512 CHUNKs to store that information. That means you need at least as many lobindex entries to point to these chunks. It takes longer to read and manage many small chunks than it does to read fewer, but larger, chunks. The ultimate goal is to use a CHUNK size that minimizes your waste, but also efficiently stores your data.
– PCTVERSION : This is used to control the read consistency of the LOB. The lobsegment does not use undo to record its changes; rather, it versions the information directly in the lobsegment itself. when you modify a LOB, Oracle allocates a new CHUNK and leaves the old CHUNK in place. If you roll back your transaction, the changes to the LOB index are rolled back and the index will point to the old CHUNK again. PCTVERSION controls the percentage of allocated LOB space that should be used for versioning of LOB data. The default is 10 percent .
– RETENTION : This is a mutually exclusive alternative to the PCTVERSION clause, and it is valid when you are using automatic undo management in the database. Rather than reserving some percentage of space in the lobsegment to version LOBs, the RETENTION clause uses the same time-based mechanism to retain data. The parameter UNDO_RETENTION database would also apply to LOB data in this case.
– CACHING: This clause controls whether or not the lobsegment data is stored in the buffer cache. One of the advantages of storing unstructured data in OS files instead of database resident objects is the facility of caching. Files can be cached in the operating system’s file buffers. A database resident object can also be cached in the database buffer cache. However in some cases the caching may actually be detrimental to performance. LOBs are usually very large (hence the term large objects) and if they come to the buffer cache, most other data blocks will need to be pushed out of the cache to make room for the incoming LOB. The LOB may Never be used later yet its entry into the buffer cache causes necessary blocks to be flushed out. Thus in most cases you may want to disable caching for the LOBs.
The benefits of caching are very application dependent. In an application manipulating thumbnail images, performance may be improved with caching. However, for larger documents or images, it is better to turn off caching.
NOCACHE : default implies that every access will be a direct read from disk and every write/modification will likewise be a direct read from disk.
CACHE READS : allows LOB data that is read from disk to be buffered, but writes of LOB data will be done directly to disk.
CACHE : permits the caching of LOB data during both reads and writes. Can be specified for small- to medium sized LOBs (used to store descriptive fields of just a couple of kilobytes). Caching refers to the LOB only. The rest of the table is placed into the buffer cache and follow the same logic as any other table regardless of the setting of the LOB caching on that table.
A LOB column always results in what I call a multisegment object, meaning the table will use multiple physical segments.
select segment_name, segment_type
The LOB in the table really just points to the lobindex, and the lobindex in turn points to all of the pieces(chunks) of the LOB itself.
As you didn’t specify it explicitly, the LOB is stored in the conventional format (BasicFiles).
Prior to 11g , we had certain assumptions about LOBs which led to BASIC FILE storage for them :
- LOBs would never grow beyond a size of a few hundred kilobytes or in extreme cases a megabyte or two. Based on this assumption, the largest chunk size the maximum amount of space that Oracle would have to convert from its source file to a LOB at one time could be kept relatively small as well, at just 32KB.
- LOBs would be loaded relatively infrequently usually, only once and then would be updated very rarely. After all, how often does a standard mailing label format or form letter change?
- Likewise, we anticipated that only one application user session would be modifying a physician’s picture or a mailing label master document at one time. Therefore, concurrency and read consistency was not going to be a major concern, so we didn t need to worry about the impact of maintaining additional rollback segments just for LOBs.
Now, the footprint of Large Objects has increased exponentially within our databases . It s being driven by diverse business needs that span every industry: medical and pharmaceutical applications that capture and serve up patient images, manufacturing applications that store components designs, and semi-structured data that business-to-business (B2B) applications use to order products to replenish a company s warehouse. However, these newer application requirements have several things in common:
- LOB sources are much more diverse. These sources include Extended Markup Language (XML) documents, geospatial data, biometric information, video files, and other semi-structured or non-structured data.
- Simultaneous multiple-user updates are common. As semi-structured data like XML has proliferated, the need to provide the ability for more than one user to simultaneously update the same LOB has also exploded.
- Large objects are gigantic … and getting larger. It s not uncommon for LOBs to extend into the gigabyte and even terabyte range, especially for audiovisual and medical imaging data. Transmission bandwidths have increased dramatically.
- Self-published user content has exploded. The YouTube phenomenon and other web sites that store self-produced audiovisual files for wide consumption has also contributed to the explosion of LOB sources.
- Security requirements have increased dramatically. Finally, many of these LOBs require special security especially personal medical data that is stored within Digital Imaging for Communication of Medical Information (DICOM) sources because of federal, state, and international regulations like the Health Insurance Portability and Accountability Act (HIPAA).
LOBs, the Next Generation: SecureFiles
As a result of these new business requirements, the LOB data type has been redesigned completely in Oracle 11g Release 1. This new LOB datatype is called a SecureFile.
If you want to store the LOB as a SecureFile, all you have to do is place a clause store as securefile in the table creation, as shown below:
create table secure_lob
( id number,
lob (txt) store as securefile
enable storage in row
- Performance and Manageability Improvements
Since the sources of LOBs have increased in size by several orders of magnitude the methods that load, store, and return data from LOBs have been refurbishment.
- Bigger CHUNK size
In prior releases, the maximum size of a chunk of LOB data that could be read or written at one time was limited to 32KB. Internal and external LOB sizes have grown exponentially , so this was a serious limitation to effective LOB I/O.
Oracle 11g remedies this by automatically choosing an appropriate CHUNK size between the database s standard block size and a maximum of 64MB. Further.
Oracle 11g helps limit the internal fragmentation of SecureFile LOBs by attempting to store extremely large SecureFile LOB segments on disk so that they re physically adjacent.
Finally, the ability to utilize variable CHUNK sizes means that the corresponding UNDO segments generated during updates of SecureFile LOBs only need to be as large as the CHUNKs and thus contain less wasted space.
- Improved REDO and UNDO Generation.
Before Oracle 11g, the Oracle DBA had relatively free rein when configuring CHUNK size, as well as the retention of undo segments for LOBs, and that didn t always yield optimal results.
Oracle 11g will choose automatically the best method for generating redo and undo for SecureFile LOBs, only two storage-related parameters are required: RETENTION and MAXSIZE. The RETENTION directive tells Oracle 11g how to handle the consistent read data (stored in UNDO segments) for a SecureFile LOB. Here s how the setting affects read consistency:
Table 1-1. RETENTION Directive Settings for SecureFiles
MAX is used to start reclaiming old versions after segment MAXSIZE is reached
MIN keeps old versions for the specified least amount of time
AUTO This is the default setting. It s basically a trade-off between space and time.
This is automatically determined
NONE Instructs Oracle 11g to reuse older versions of the SecureFile LOB as much as possible
The new MAXSIZE storage setting works in concert with the setting for RETENTION, and it determines the maximum size of a SecureFile LOB. When RETENTION is set to MAX, then the LOB is allowed to grow up to a size of MAXSIZE, after which space can be reclaimed from UNDO segments; otherwise, the MAXSIZE setting is treated as a hard limit on the size of the SecureFile LOB s segment.
Note that the storage attributes CHUNK, PCTVERSION, FREEPOOLS, FREELISTS, and FREELIST GROUPS are simply ignored for SecureFile LOBs
- Automatic Deduplication
One of the more intriguing SecureFile LOB features is the ability for Oracle 11g to automatically locate any LOBs that store exactly the same data within the same table and column. Once detected, Oracle 11g will store only one copy of the LOB data. The advantage here, of course, is dramatically reduced amounts of disk space are needed to store the same amount of data; further, redo logging and UNDO is reduced significantly because an update to SecureFile LOB data doesn t require retaining duplicate versions of redo log entries and UNDO segments.
SQL> alter table secure_lob
After the deduplication, the database calculates the hash values of the columns values in each row and compares them to the others. If the hash values match, the hash value is stored AND not the actual LOB. When a new record is inserted its hash value is calculated, and if it matches to another value then the hash value is inserted; otherwise the real value is stored.
You can also reverse the deduplication process:
SQL> alter table secure_lob
- Intelligent Compression.
This new feature allows Oracle 11g to compress and uncompress SecureFile LOB data automatically and transparently, and this means that disk space for these data is reduced dramatically. Whenever a query or DML statement retrieves or applies changes to a SecureFile LOB s data, Oracle 11g uncompresses only those data blocks needed for the operation, and it automatically maintains the required mapping between the uncompressed data blocks in the Database Buffer Cache and the database blocks on disk. There are two compression levels: MEDIUM (the default) and HIGH; specifying HIGH will tend to increase latency during SecureFile LOB access, but will also result in the highest storage compression ratios.
SQL> alter table secure_lob
Compression is not the same as deduplication. Compression happens inside a LOB column, inside a row each LOB column is compressed independently. In deduplication, all the rows are examined and duplicate values in the columns are removed and replaced with pointers. If you have two very different rows, deduplication will not reduce the size; but compression may optimize the space inside the LOB value. You can compress as well as deduplicate the table.
Compression takes up CPU cycles so depending on how much data is compressible, it may not be worthy of compression. For instance, if you have a lot of JPEG pictures they are compressed already, so further compression will not save any space. On the other hand, if you have an XML document as a CLOB, then compression may produce substantial reduction. SecureFiles compression automatically detects if the data is compressible and only spends CPU cycles if compression yields gains.
Also note that the LOB compression is independent of table compression. If you compress the table SECURE_LOB, the LOBs are not compressed. The LOB compression occurs only when you issue the above SQL.
In Oracle Database 11g Release 2, there is a third compression option in addition to HIGH and MEDIUM: LOW. As the name suggests, it compresses less but also consumes a lot less CPU and completes faster. This approach uses a block-based lossless compression similar to the fast Lempel Ziv Oberhumer (LZO) algorithm.
Let’s see an example of a table with the SecureFiles LOW compression:
create table docs
LOB(doc_body) store as securefile
Let’s see an example with the same table and column. First, we modify the column to uncompressed:
SQL> alter table docs
2 modify LOB(doc_body)
Now, let s modify the column for low compression:
SQL> alter table docs
2 modify LOB(doc_body)
4 compress low
If you omit the LOW clause, the default is MEDIUM. The LOW compression is not just for table creation; you can use it to alter an existing column as well.
- Transparent Encryption
Every organization has ethical and legal obligations to protect customer-sensitive information stored within LOBs. For example, a patient s medical records and radiology images must be protected. However, there s also enormous amounts of intellectual property information that must be stored in semi-structured object forms for example, molecular formulas and manufacturing blueprints that is crucial to any company’s competitive advantage.
SecureFile LOBs fill this hitherto serious gap in database security because it s now possible to encrypt LOB data within an Oracle 11g database. SecureFile LOBs are encrypted using the same Transparent Data Encryption (TDE) algorithms that were introduced in Oracle 10gR2, and this means it s possible to encrypt either any or all SecureFile LOBs within the same table using the four standard TDE encryption algorithms (AES128, AES192, AES256 and 3DES168).
sql>alter table secure_lob
(encrypt using ‘AES128′);
Before enabling encryption you have to set up encryption wallet.
Here are the steps in summary:
1. Set the parameter in sqlnet.ora, if not set already to specify the location of the wallet:
The directory /opt/oracle/orawall should already exist; if not then you should create it.
2. Create the wallet:
alter system set encryption key authenticated by “mypass”
This creates the wallet with the password mypass and opens it.
3. The above steps are needed only once. After the wallet is created and open, it stays open as long as the database is up (unless it is explicitly closed). If the database is restarted, you have to open the wallet with:
alter system set encryption wallet open identified by “mypass”
When a SecureFile LOB column is encrypted, the column values of all the rows of that table are encrypted.
SecureFiles are not merely the next generation of LOBs; they add much more value to them, especially the features that were formerly only in the domain of filesystems. SecureFiles can be encrypted for security, de-duplicated and compressed for more efficient storage, cached (or not) for faster access (or save the buffer pool space), and logged at several levels to reduce the mean time to recover after a crash. With the introduction of SecureFiles, you can store more unstructured documents in the database without incurring too much overhead or losing any critical functionality afforded by an OS filesystem.