CQ5

AEM/CQ Reverse Engineering Demystified

Have you ever struggled to find out how to implement a particular feature for Adobe AEM/CQ? Did your attempt to find what you needed on Google or discussion groups fail? In your gut, do you know it's possible to implementation the feature since AEM performs a very similar operation? If you answered yes to any of these questions, some degree for reverse engineering may do the trick.

Read More

AEM Solr Search Now Available

AEM Solr Search Now Available

I am happy to announce that AEM Solr Search is finally out in Beta! Visit http://www.aemsolrsearch.com/ and start integrating AEM with Apache Solr. Watch the video for a quick preview on building a rapid front-end search experience, then jump into the Getting Started guide and experiment with the Geometrixx Media Sample application.

Read More

adaptTo() 2014 - Integrating Open Source Search with CQ/AEM

I just received confirmation that I will be speaking at adaptTo() 2014. This session describes several approaches for integrating Apache Solr with AEM. It starts with an introduction to various pull and push indexing strategies (e.g., Sling Eventing, content publishing and web crawling). The topic of content ingestion is followed by an approach for delivering rapid search front-end experiences using AEM Solr Search. 

A quick start implementation of the search stack will be provided as part of this presentation. The quick start installer includes pre-configured instances of Apache Solr and Apache Nutch. This presentation will also include the source code for the Community Edition of headwire.com’s AEM Solr Search. AEM Solr Search is a suite of AEM search components and services designed to integrate with Apache Solr. 

There will be a hackathon session afterwards, so it would be great to see you in person.

Integrating Apache Solr with Adobe CQ / AEM

Recently, I have been noticing a bit of interest by the CQ community regarding CQ / Solr integration. However, as most people have pointed out, there isn't a clear path detailed anywhere. Given the interest, I will be posting regularly on the subject. This first post will stay relatively high-level and discuss the possible integration points.

There are really two areas that should be considered when integrating Solr with CQ: indexing content and searching content. For the most part, you can treat these as two independent efforts.

Indexing CQ Content

Over the past 6 months I have experimented with multiple approaches to indexing CQ content in Solr. Each approach has its respective strengths and weaknesses.

  1. Crawl your site using an external crawler.
  2. Create one or more CQ servlets to serialize your content into a Solr JSON or Solr XML Update format.
  3. Create an observer within CQ to listen for page modifications and trigger indexing operations to Solr.

Using an External Crawler

Using an external crawler such as Nutch or Heritrix is perhaps the simplest way to start indexing your CQ content; however, it does have its drawbacks. Using a crawler involves working with unstructured content in the form of mainly HTML documents. While most crawlers do a decent job extracting the content body, title, url, description, keywords and other metadata, you typically need to define a strategy for extracting other useful data points to drive functionality such as faceting. Extracting this information can be achieved in several ways: use an external document processing framework (recommended), use Solr's Update Request Processor (not recommended), use Solr's tokenizers for basic extraction, etc.

The other drawback with this approach is that it uses a pull approach to indexing content. There are ways around this; however, using a crawler typically means that you will be sacrificing real-time indexing.

CQ Servlets & Solr Update JSON/XML

Another possible approach is to create one or more CQ servlets that produces a dump of your CQ content using Solr's Update JSON or Update XML format. The advantage here is that you are working with structured content and have full access to CQ's APIs for querying JCR content. An external cron job can then be used to fetch this page using curl and post it to Solr.

A variation of this approach is to use a selector to render a page in either the Solr JSON or XML update format. 

CQ Observer

Using a CQ observer provides the tightest integration with Solr and as such provides real-time indexing capabilities. Like the CQ Servlet approach, it simplifies content extraction since you are working with structured data. There are several methods for implementing an observer. Refer to Event Handling in CQ by Nicolas Peltier. My personal preference is listening to Page Events and Replication Events using Sling Eventing. In this approach once you receive an event, such as page modification, you can use the SolrJ API to update the Solr index.

Searching CQ

Once you have your CQ content indexed in Solr you will need a search interface. While there are several approaches for building search experiences against Solr, the most popular approach is to use Solr's Java API, SolrJ. For client-side integration, ajax-solr is a great choice.

Lastly, I need to shamelessly plug an upcoming integration for CQ and Solr by headwire.com, Inc, aptly named CQ Solr Search. This integration offers support for building search interfaces using search components built on ajax-solr as well as a configurable CQ observer for real-time Solr indexing. We will be introducing the first public implementation on CQ Blueprints. Our intent is to provide one place for searching all CQ/Sling/JCR content on the web.

Upcoming

Based on the community feedback, please stay tuned for the following. 

  1. CQ Solr Search by headwire.com, Inc. - (Not yet available)
  2. A Step-by-Step Guide to Indexing CQ with Nutch (Coming soon)
  3. A Steb-by-Step Guide to Indexing CQ with CQ Servlets (Coming soon)
  4. A Step-by-Step Guide to Indexing CQ using an Observer (Coming soon)

 

 

 

 

 

 

Deploying the FAST ESP Search API to CQ 5.5

This post is dedicated to any OSGi developer who has endured the pain of wrapping a third-party JAR in order to deploy it to an OSGi container.

In this post we will deploy the FAST ESP Java Search API to CQ 5. Since Microsoft does not provide an OSGi bundle for this API, we will create our own using the technique described on the CQ Blueprints post, Deploying 3rd Party Libraries.

The high-level approach is as follows:

  1. Download the FAST ESP Java Search API (version 5.3.0.6) from Microsoft Connect and upload it to your 3rd party Nexus repository. I assume that the readers of this post are familiar with Nexus and have their own repository.
  2. Create a Maven project to create the wrapped version of API.
  3. Deploy the wrapped version of the API to your Nexus repository.
  4. Deploy the wrapped version of the API to CQ via the Felix console.
  5. Add the wrapped version of the API as a dependency to your Maven project.
  6. Update your CQ instance to allow sun.io to be exported as part of the Felix system bundle from the framework classloader.

Adding a 3rd Party JAR (esp-searchapi.jar) to Nexus

It is recommended that you add a proxy repository to http://repository.opencastproject.org/nexus/content/groups/public/ as this repository has the Xalan and Xerces artifacts used by this article.

  1. Log in as the admin user to your Nexus repository (i.e., http://localhost:8081/nexus/)
  2. Select Repositories and click 3rd party repository.
  3. Click the Artifact Upload tab an enter the following information:

    GAV Definition:GAV Parameters
    Group:no.fast
    Artifact:esp-searchapi
    Version:5.3.0.6
    Packaging:jar

  4. Click the Select Artifact(s) to Upload… button and browse to the location of the FAST ESP Java Search API (i.e., esp-searchapi.jar).
  5. Once selected, click the Add Artifact button followed by the Upload Artifact(s) button.
  6. If successful, you should now have a vanilla version of the FAST ESP Java Search API that can be included as a dependency by Maven. This dependency will be used in the next step.

<dependency>
  <groupId>no.fast</groupId>
  <artifactId>esp-searchapi</artifactId>
  <version>5.3.0.6</version>
</dependency>

Create a Maven Project to Build the Wrapped JAR

Create the following POM. Please note: the dependencies listed in the POM below were defined by trial and error. I had many unsucessful deployments to Apache Felix with failed dependencies. In the end, the list of embedded dependencies included Xalan, Xerces and Log4j. Most of the remaining dependencies, such as HttpClient and javax.* packages were satisfied by Felix. Actually, the only dependency that was not satisfied was the sun.io package. I solved this by allowing Felix to export and load the sun.io packages from the framework class loader.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

	<modelVersion>4.0.0</modelVersion>

	<groupId>no.fast</groupId>
	<artifactId>esp-search-api-wrapped</artifactId>
	<version>5.3.0.6</version>
	<packaging>bundle</packaging>

	<name>FAST ESP Search API</name>
	<description>An OSGi version of FAST ESP Search API</description>

	<properties>
		<esp-searchapi.version>5.3.0.6</esp-searchapi.version>
	</properties>

	<dependencies>
		<dependency>
			<groupId>org.apache.xalan</groupId>
			<artifactId>com.springsource.org.apache.xalan</artifactId>
			<version>2.7.1</version>
		</dependency>
		<dependency>
			<groupId>org.apache.xerces</groupId>
			<artifactId>com.springsource.org.apache.xerces</artifactId>
			<version>2.9.1</version>
		</dependency>
		<dependency>
			<groupId>log4j</groupId>
			<artifactId>log4j</artifactId>
			<version>1.2.15</version>
		</dependency>
		<dependency>
			<groupId>no.fast</groupId>
			<artifactId>esp-searchapi</artifactId>
			<version>5.3.0.6</version>
		</dependency>
	</dependencies>

	<build>
		<pluginManagement>
			<plugins>
				<plugin>
					<groupId>org.apache.felix</groupId>
					<artifactId>maven-bundle-plugin</artifactId>
					<version>2.3.5</version>
					<extensions>true</extensions>
				</plugin>
			</plugins>
		</pluginManagement>
		<plugins>
			<plugin>
				<groupId>org.apache.felix</groupId>
				<artifactId>maven-bundle-plugin</artifactId>
				<configuration>
					<instructions>
						<Import-Package>javax.*,sun.io.*,org.apache.commons.httpclient.*,org.apache.commons.logging.*</Import-Package>
						<Embed-Dependency>*;scope=compile|runtime</Embed-Dependency>
						<Embed-Directory>OSGI-INF/lib</Embed-Directory>
						<Embed-Transitive>true</Embed-Transitive>
						<_exportcontents>
							com.fastsearch.esp.search.*;version=${esp-searchapi.version}
						</_exportcontents>
					</instructions>
				</configuration>
			</plugin>
		</plugins>
	</build>
</project>

Run mvn clean install. This should produce a file called esp-search-api-wrapped-5.3.0.6.jar in your target directory.

Similar to before, upload this artifact to your 3rd party Nexus repository using the following:

GAV Definition:GAV Parameters
Group:no.fast
Artifact:esp-searchapi-wrapped
Version:5.3.0.6
Packaging:jar

You should now be able to use the new wrapped version of the API in your Maven POM by adding the following dependency.

<dependency>
  <groupId>no.fast</groupId>
  <artifactId>esp-searchapi-wrapped</artifactId>
  <version>5.3.0.6</version>
</dependency>

Deploy the esp-searchapi-wrapped-5.3.0.6.jar to CQ via the Felix console.

Lastly, edit yourcqinstance/crx-quickstart/sling.properties and add the following line. This will allow Felix to export sun.io and make it available from the framework classloader.

org.osgi.framework.system.packages.extra=sun.io

.

Once this change is made, restart CQ 5.

CQ5 WebDAV Support for Windows 7 64-bit

After a long break from working in the content management space, I returned to my CMS roots with a focus on CQ5. As a novice CQ5 developer, I've been chipping away at CQ5 recipes such as: As a developer, I would to like to access the CRX via WebDAV on my Windows 7 workstation. Simple question, right? Wrong. As it turns out Windows 7 64-bit does not support mapping a WebDAV resource easily. Sure, there were claims that applying KB907306 would do the trick. This didn't work. There were instructions on mucking with the registry. Really, this isn't the 90s. No thank you. Oh, wait...there are third-party freeware packages such as BitKinex. Again, no thank you. Lastly, there were some articles around changing the authentication scheme from Basic Authentication to Digest Authentication. Why can't I have native support! 

Enough with the rant. I recently had a good experience building a command line WebDAV client under Linux (CentOS) called cadaver. As a command line guy, I already had Cygwin running under Windows 7. Sure enough, Cygwin supports cadaver under All > Web

For those of you running Windows 7, need WebDAV support and don't mind using the command line, try cadaver out. Once installed, connecting to the CRX is pretty painless.

  1. Launch Cygwin
  2. Create a file called ~/.netrc and include the following lines. This will allow you to interact with the CRX without being promoted for a username and password.
    machine localhost
    login admin
    password admin
    
  3. Run cadaver.
    $ cadaver http://localhost:4502/crx/repository/crx.default
    
  4. You should now receive a shell to interact with the CRX. Most of the commands are similar to a command line FTP client (ls, cd, get, etc.). Simply type help for a list of available commands.