Oracle Database, as of 10g Release 2, uses bloom filters in various situations. Unfortunately, no information about their usage is available in Oracle documentation. For this reason, I decided to write a paper to explain not only what bloom filters are, but also, and foremost, to describe how Oracle Database makes use of them. Specifically, to explain how the database engine uses bloom filters to reduce data communication between slave processes in parallel joins and to implement join-filter pruning.
Originally, I wrote this paper for the IOUG Select Journal. Even if I wrote it last June, I wanted to receive the printed copies before putting it online. Today a packet with five copies of the Q4/2008 issue and a polo shirt arrived… Hence, it is now available online as well.
If you don’t have access to Select Journal, you can download it from this page.
Very enjoyable paper.
I knew about Bloom Filters in general, but it was nice to see a simple PL/SQL implementation and to know where they are used within Oracle.
Pretty timely, too! ;-) Page 14, here …
I’ve made a start on your book, btw. Looks good so far but it’ll probably end up half-read for a few months like all the books do. I have a course slides deadline approaching so it’s back on the shelf for now ….
I’m aware of bloom filters in Oracle since December 2005. So, I wanted to write something about them for a long time. IIRC we also discussed about them at the 2006 UKOUG conference… Therefore, I’m really happy that this short paper is, at last, published.
I was laughing about the coincidence. I’ve been aware of bloom filters for quite a while too but it’s just funny that this should come out at a time when bloom filters happen to have been mentioned in the same week as they’re mentioned in an important Oracle white paper. It was just a small joke really.
Well, not *just* a joke. It’s nice that your paper comes out just when a lot of people might be wondering what a Bloom Filter is. Therefore it is timely in that sense.
Thanks Christian for this really nice whitepaper and helped me with understanding some of the concepts
(and thanks Doug for the link, although Kevin didn’t appriciated it)
Dennis Yurichev  posted on Oracle-L  about a list of Undocumented Oracle Functions. There are (at least) 2 which might be of some interrest. Maybe they could look the examples more real ;-)
Thanks for the interesting paper.
Any idea how Dennis Yurichev generates that listing of undocumented functions?
On Linux/Unix I would execute the following command:
Look at msqdef.o file in libserver11.a, it contain (if I’m correct) all functions, not just SYS_OP.
[…] Bloom Filters – by Christian Antognini. […]
[…] A Bloom filter is simply a small dataset created by the first slave set that is a “summary” of the values that are likely to be useful in the join. As it builds the in-memory hash table, the first slave set can construct a Bloom filter for the join values; when the hash table is complete the first slave set can pass the Bloom filter to the second slave set (possibly by way of the query co-ordinator, I haven’t checked the mechanism yet) allowing the second slave set to discard the data that definitely won’t be needed and passing only data that will probably be needed through the table queue. (Note: the Bloom filter is not a perfect filter, it may allow some data that isn’t needed. For more detailed information about Bloom filters, see this article by Christian Antognini.) […]