Rubicon 2.04
Critique by Eric Harmon

Sept/Oct 1999, Vol. 10, No. 3


Tamarack Associates
868 Lincoln Avenue
Palo Alto, CA 94301
TEL: 650.322.2827
FAX: 650.322.2827
EMAIL:
sales@tamaracka.com
WEB:
www.tamaracka.com

Product: Rubicon 2.04

Summary: Rubicon is a lightning-fast search engine for Delphi and C++ Builder programmers. Using Rubicon, you can search databases, text, HTML, and RTF data quickly and easily.

Suggested List Price: $299 for new users; $150 upgrade from Rubicon 1.x Professional Edition; $200 upgrade from Rubicon 1.x Workgroup Edition; $250 upgrade from Rubicon 1.x Standard Edition.


Rubicon is a text searching and retrieval engine that can locate key words or phrases blazingly fast. Chances are you may have used Rubicon before without even knowing it. If you’ve visited http://developer.href.com to search for the answer to a sticky problem, you’ve used Rubicon to perform the search. Tamarack attests that Rubicon searches are anywhere from 1,000 to 5,000 times faster then SQL queries!

With the expectation that most developers will add search capability to their database applications, Rubicon supports a wide variety of popular database engines, including the BDE, Advantage, Apollo, DBISAM, FlashFiler, Halcyon, InterBase Objects, ODBC Express, Opus DirectAccess, Titan Access, and Topaz. However, Rubicon is not limited to searching only database-related data. Rubicon excels at searching information that can be indexed by an integer value, such as HTML files, RTF files, and text files.

Rubicon works by making an index of all searchable words in your database or document. Each word is associated with one or more integer IDs that indicate where that word is found. For databases, the integer ID will probably refer to an indexed field in the database or an absolute record number. For non database-related data, such as a long document, the ID might refer to a paragraph number instead. When you perform a search, Rubicon typically needs to only consult the word index to determine whether a match is found.

Searching in Rubicon is a two-step process. First, you create the word index. The easiest way to do this is to use TMakeDictionary to (re)build the word index from scratch. Ideally, you would run this process at night or when the data is not actively being accessed. If your data is always “live,” you can use TUpdateDictionary to incrementally update the word index as the data changes. Once the word index is created, full-text searches can be performed in a matter of milliseconds.

Rubicon also supports searches on words and phrases, as well as boolean searches such as “search AND engine.” In addition, Rubicon supports advanced searches using the keywords near and like. For example, you can specify searches like “Delphi near Borland,” or “like MacDonald.”

Simple searches and boolean searches can be performed without ever hitting the actual data itself. Rubicon simply consults its word table to determine where the text is located. For phrase searches, like “Rubicon is fast,” Rubicon must do a little more work. Once it determines that all the words in the phrase appear in the same location, it must physically read the data to see if the words occur in order.

Rubicon 2.04 adds new properties, named FastPhrase and LazyPhrase, that can significantly speed up phrase searches, at the expense of a larger word index. When FastPhrase is enabled, Rubicon not only indexes individual words, but also combines neighboring words together to form one “word.” For instance, if your text contains the phrase “Rubicon kicks butt,” Rubicon indexes not only the three individual words, but also the “words” Rubiconkicks and kicksbutt. If you search for the phrase “Rubicon works,” Rubicon won’t find the “word” Rubiconworks in the word index, and so never consults the actual data.

If you can tolerate the idea of reporting a few false matches, you can enable LazyPhrase in addition to FastPhrase. LazyPhrase short-circuits the phrase-matching logic, thereby issuing phrase searches at the same speed as boolean searches. For example, a search for “Tamarack Associates” would be executed as “Tamarack AND Associates.”

Rubicon 2.04 ships with a spiral bound, 180-page users manual. The manual goes into detail about the inner working of the search engine, giving you ammunition to best tailor the engine to your program’s needs. However, the quickest way to get up and running with Rubicon is to play with the supplied example programs. Rubicon also ships with clearly written Object Pascal source code.

Rubicon 2.04 supports Delphi versions 1 through 4 and C++ Builder versions 3 and 4. Note that C++ Builder 1 is not supported.

Rubicon is the de facto standard for high-speed full-text searches. If you want to incorporate fast text searches into your applications, look no further. Rubicon is the solution. v