Sunday, August 29, 2010

JDBC Connectivity to Hypertable!

Hypertable (http://www.hypertable.org) is an implementation of the Google's BigTable which in short is a scalable, distributed sorted hashtable allowing for the storage of massive amounts of data across a cluster of commodity hardware. Hypertable has a rich query language (HQL) and a command line client akin to MySql's command line client; however, this can be limiting.

This article will cover two things:
1) JDBC Connectivity to Hypertable
2) Usage of this driver to graphically browse and modify a Hypertable instance

JDBC Driver

Hypertable uses the Thrift protocol to remotely connect and query an instance. This is a very useful method of reading/writing data but it's not standard and can be difficult to develop applications using Hypertable. For Ruby developers, there is a package called HyperRecord which provides an ActiveRecord like interface to Hypertable which makes integrating Hypertable into a Ruby (on Rails) application much easier. For Java, I recently released a JDBC driver that communicates through Thrift to Hypertable which now allows you to integrate Hypertable into a Java application via JDBC. One such application is integration with a graphical browser which was the motivation behind writing this driver.

Driver Limitations:
  1. Only the latest timestamped version data is returned.
  2. When retrieving metadata objects (ResultSet or Database), only the column families defined in the schema will be returned. In other words, column family qualifiers aren't going to be shown in the metadata objects although their value can be retrieved in code. Example: In code, you can do rs.getString("address:home") and rs.getString("address:work") while the metadata will only show "address" as a valid column name with no value unless 'address' explicitly has a value in the given table.
You can download the driver at http://github.com/downloads/ANithian/hyperjdbc/hypertable-jdbc_0.1.tar. Simply add all the jars to your classpath and use the driver "org.hokiesuns.hypertable.jdbc.HTDriver". An example URL is "jdbc:hypertable://192.168.116.128:38080" with no username, password or schema. An example application can be found by executing org.hokiesuns.hypertable.HypertableJDBCTester passing in the ThriftBroker hostname/ip as the command line argument.

Graphical Browser

The motivation behind writing this driver was to be able to view data in Hypertable using a graphical browser instead of the command line. There are numerous java based graphical browsers and the one that I tested with was SQLWorkBench/J (http://www.sql-workbench.net/). Rather than providing screenshots with explanations, I have created a video that should help show this driver in action and how easy it is to start talking to Hypertable using a third party graphical interface!



Conclusion

Along with integration for graphical browsing of Hypertable data, I believe that releasing a JDBC driver to Hypertable will help make its use more widespread and make it easier to plug this great technology into the vast Java application landscape. If you see any bugs, please file a bug report along with all the necessary information at http://github.com/anithian/hyperjdbc/issues

Saturday, January 2, 2010

Setting up Apache Solr in Eclipse

Apache's Solr is a powerful software package that allows you to develop your own search engine in no time. It's purely written in Java using Lucene at its core and can run inside any servlet container such as Tomcat (or Jetty). Eclipse is an IDE that makes developing Java applications incredibly easy because of its wealth of features such as code completion and refactoring capabilities not to mention the number of free plugins available to further make development easier. I find it much easier to keep everything contained in one place and being able to code, debug, and test inside of Eclipse makes developing my search engine much easier. This simple tutorial will show you how to setup Apache Solr to run inside Eclipse using a free third-party plugin that runs Jetty inside Eclipse.

You will need:
Step One: Basic Setup

Download and extract both Eclipse and Apache Solr tar files somewhere on your disk. Since Solr has some XML configuration files, I would also suggest installing the Eclipse WTP (Web Toolkit Platform) that gives you some good built-in XML editors.



Follow the Getting Started guide in the RunJettyRun wiki to install the plugin. It should be pretty fast and easy.

Step Two: Create your Java project

Create a standard Java project in Eclipse (File..New..Java Project). Call it what you wish (I called it "TestSolr"). The default options should be fine and click through the wizard to see all the options or click Finish at your first chance to get done faster.



Here you should see your TestProject in your workspace with a blank src folder.

Step Three: Setup the Solr webapp in your Eclipse project.

This is where the RunJettyRun plugin installed earlier gets used. This plugin allows you to develop, run and debug web applications inside Eclipse allowing you to take advantage of Eclipse's powerful code editing and debugging capabilities. It's also one of the simpler web application development plugins available. Eclipse supports full blown web development but for the purposes of this tutorial and developing needs, this plugin is more than enough.
  • Inside the TestSolr directory, create a folder called "webapp". Do this by right clicking on the "TestSolr" in the workspace, select "New" and then "Folder".
  • At a command prompt, unjar the apache-solr-1.4.0.war in this webapp folder. In Windows, this would be done by the following command:
    jar -xvf c:\applications\apache-solr-1.4.0\dist\apache-solr-1.4.0.war
    The contents of the war file should be in the webapp folder. To confirm, right click on the "TestSolr" folder in the Eclipse workspace and select "Refresh."
  • Add all the jar files in the webapp/WEB-INF/lib to the Build Path. This is done by selecting all the jars and right click on any of the jars (while all are selected), select the "Build Path" sub-menu and select "Add to Build Path"
  • Setup a solr home folder inside your project. For purposes of this tutorial, copy the "solr" folder from the "example/" folder in the directory containing solr into the "TestSolr" folder in your Eclipse workspace. By now, your "TestSolr" project layout should look something like this.

Step Four: Let's run this thing!

Now that you have setup your project, it's time to create a run configuration for Jetty and run this!
  • In Eclipse, go to the Run menu and select "Run Configurations...".
  • On the left rail, you should see "Jetty Webapp" as one of the run configuration types. Right click on this and select "New". The project should be "TestSolr" (if not, type in TestSolr in the text box labeled with "Project"). The name of the run configuration is populated with "TestSolr" and can be whatever you wish.
  • The default HTTP port is 8080 and can be left alone if you wish. To conform to Solr tutorials (making copying/pasting links easier, change this to 8983). Delete the HTTPs port 8443 since we aren't doing any SSL access. This will disable the SSL specific fields such as the keystore/password fields.
  • Change the "context" to "/solr" from "/". This is more for conformity with the solr tutorials in that all links in the wiki you to something like http://localhost:8983/solr/.
  • Type in "webapp" in the "WebApp dir" text box. This is the root of the web application and is the directory above the WEB-INF folder. This is where the solr war file was extracted and hence is the equivalent of deploying the solr war in your servlet container.
Your launch configuration should look like the following:


Now click the "Run" button which will save the changes and the console will hopefully start spewing out logs produced by both Jetty and Solr. Open a browser to http://localhost:8080/solr/admin/ and voila! you should see the Solr admin page. In scrolling through the console, you shouldn't see any exceptions thrown.

This simple tutorial just shows the basics of setting up a simple solr installation inside Eclipse using Jetty and running inside Eclipse. Notice that we don't have any of our own code running that may do some extra things (say custom analyisis, tokenizing etc). If we did have such code, then it would naturally lie in the "src/" folder; however, we need to instruct Eclipse to compile the code so that it goes in the WEB-INF/classes folder so it will run inside Jetty. To do this:
  • Right click on the TestSolr project and select "Properties".
  • Click on the "Java Build Path" on the left rail
  • Click on the "Source" tab and towards the bottom, change the default output folder to be "TestSolr/webapp/WEB-INF/classes".


Now you can develop your custom solr plugins directly in Eclipse and debug them immediately. Instead of running the TestSolr jetty instance, you can launch it in debug mode by going to the Run Menu and opening the "Debug Configurations.." sub-menu, selecting the Jetty configuration created earlier and click the "Debug" button. I would encourage familiarizing yourself with Eclipse's debugging capabilities for they are vast and amazing.

If you have any questions or comments, please post them and I'll do my best to answer them as quickly as possible.