In this article I’ll give a brief overview of Oracle 12c in Memory – a new feature designed to achieve sub-second response times from analytic queries. Don’t worry if you’re not a DBA – this is a plain English overview to get you started.
Business Intelligence dashboards demand sub-second analytic query performance against millions of rows. Oracle 12c In Memory Option(OIM) aims to deliver super fast results by transparently locking data in memory – thereby avoiding slower disk access.
In real-world tests on a dimensional data warehouse application it resulted in performance agains of 4-27 times when compared to the traditional approach.
The diagram below illustrates how Oracle normally reads data from disk and holds it in memory in the Buffer Cache. As the cache size is normally small relative the overall database size, it’s gradually aged out over time on a least recently used basis. An additional area, the KEEP buffer cache can be used to pin frequently used tables in memory to reduce expensive disk accesses, and the option exists to prioritise data held in the cache, otherwise as data is read infrequently, it will gradually be aged out.
OIM creates an additional Column Structured cache designed to permanently hold frequently accessed data in memory. As it's held in a column (rather than the default row) format, it can benefit from higher compression rates, and data access is optimised for analytic queries which typically include a few columns from very wide tables.
This means, for example, a query to summarise the SALES by REGION from a table with 100 columns, would only need to access around 2% of the data (two of the 100 columns), which can result in impressive performance gains.
The diagram below illustrates how it works. On a system without OIM (on the left), the user submits a query which reads from the Buffer Cache. If the data is not available in memory, it triggers physical read operations, and results are returned to the user.
With OIM (on the right), if all the data needed is held in memory, it's executed entirely against the in memory cache. Queries can transparently combine data from the In Memory and Buffer cache
Unlike the buffer cache which holds entire rows, the In Memory option allows the DBA to hold specific columns in memory making more efficient use of this expensive resource. This can be extended to specific partitions, which mean for example, a system could cache the most frequently accessed columns from the most recent (and actively queried) data in memory.
OIM also implements Storage Index Technology used in the Exadata hardware appliance. Unlike traditional indexes which are deployed by the DBA, these are transparently built and maintained by Oracle itself, and speed in-memory data scans by skipping over data based upon the query Where clause.
Like many Oracle features, the In Memory option is transparent to the developer, and data can be marked for addition or removal from memory with a simple Alter statement with no application changes required.
Queries can however also combine data from both the In Memory and Buffer Cache. For example, the most recent results can be held in memory, while less frequently accessed (older data), or dimensions can be read from disk as needed.
Normal Insert and Update operations receive no benefit at all as these are performed against the Buffer Cache. However, selected columns or entire tables can be pinned in memory, and automatically compressed using columnar compression. I found Dimension tables (with frequently repeated text values), compression rates were around 2-7 times, and Fact tables typically around 50%. It's also possible to tune the level of compression and to prioritise the speed at which data is refreshed as it's changed.
While many DBAs and application developers are aware of frequently accessed tables, choosing which partitions or even columns to pin in memory is a challenge. As memory is a finite and valuable resource, it makes sense to spend time tuning usage, and it's sensible to use the Oracle Diagnostics and Tuning Pack to help. Be aware, if a query includes a column which is no pinned in memory, the default Buffer Cache method will be used.
Although data fetches are often the primary driver of disk traffic, performing large sort operations with an insufficient Sort Area Size will also lead to disk accesses which may eliminate the performance gains. As most analytic queries include a GROUP BY or ORDER BY clause, It's therefore worth checking all sort operations are entirely in memory, and tuning the PGA Sort Area Size accordingly.
During a series of test queries, I carefully checked for physical disk operations, and was suprised to find Oracle Parallel Query had a significant negative impact upon query performance. In fact, query performance went from three times faster to over 12 times faster when I switched to serial processing.
It seems, using Oracle Parallel Query on very large data sets can produce unexpected physical IOs as each parallel slave process needs it's own memory area to collate and sort the data, and this in turn magnified the problem of large sort operations.
Switching off Oracle Parallel Query on In Memory operations has a huge scalability benefit, as each user executes on a single core intead of flooding the machine with parallel processes. It does however indicate a potential architectural hardware decision - buy machines with lots of very fast processors and lots of memory to maximise the benefits.
Keep in mind a poorly designed application may not work any faster than before. In one case, another team where surprised to find zero benefit from using OIM to join two massive (multi-gigabyte) fact tables in memory. In reality they probably hit the limits of in memory sorting, and would have benefited from a table redesign producing a single fact table with simple dimensional joins.
Likewise, OIM works best for analytic type queries which summarise or calculate averages over large data sets. It's not a solution to improve OLTP appilcations, although it could be deployed on the same platform as an OLTP system to deliver fast operational reports.
Oracle now faces some serious competition in the Data Warehouse and Business Intelligence space. Dedicated colum based data stores like Sybase IQ and HP Vertica are now joined by the Hadoop based products including Impala from Cloudera and Apache Spark - both which aim to provide sub-second query performance against massive data volumes.
Although architecturally, Oracle 12c is rather a bolt on solution to the challenge from faster column based data stores, it does have the advantage of being a simple upgrade to existing systems. With performance gains of between 4 and 27 times faster compared to the traditional solution on a relatively inexpensive Oracle Data Appliance, they certainly provide a compelling solution. The fact this is entirely transparent to the application is a massive advantage, and it certainly gets my vote.
Enter your text here...
After 30 years as a freelance Data Warehouse Architect, I'm turning my attention to Big Data including Hadoop, NoSQL and NewSQL.