Advanced AEM Search: Consuming External Content and Enriching Content with Apache Camel

I had the pleasure of speaking at CIRCUIT 2016 on a new architecture for indexing AEM content and external content using ActiveMQ, Apache Camel and Solr. My slides are available on SlideShare. The demo code for indexing both products and AEM content are available on GitHub

Initially, I had planned for a rather ambitious demo, but ran out of time during the talk. As such, I recorded a fairly lengthy video which is available on Youtube or inline below.

A big thank you to all the attendees and conference coordinators!

Solr Document Processing with Apache Camel - Part III

For those of you that are still following along, let's recap what we've accomplished since the last post, Solr Document Processing with Apache Camel - Part II. We started by deploying SolrCloud with the sample gettingstarted collection and then developed a very simple standalone Camel application to index products from a handful of stub JSON files.

In this post, we will continue to work against the SolrCloud cluster we set up previously. If you haven't done this, refer to the Apache Solr Setup in README.md. We will also start out with a new Maven project available in GitHub called camel-dp-part3. This project will be similar to the last version; but with the following changes:

  1. We will be using a real data source. Specifically, Best Buy's movie product line.
  2. We will introduce property placeholders. This will allow us to specify environment-specific configurations within a Java properties file.
Read More

Solr Document Processing with Apache Camel - Part II

In my last post, Solr Document Processing with Apache Camel - Part 1, I made the case for using Apache Camel as a document processing platform. In this article, our objective is to create a simple Apache Camel standalone application for ingesting products into Solr. While this example may seem a bit contrived, it is intended to provide a foundation for future articles in the series.

Our roadmap for today is as follows:

  1. Set up Solr
  2. Create a Camel Application
  3. Index sample products into Solr via Camel
Read More

AEM Solr Search 2.0.0

We just released AEM Solr Search 2.0.0. This is the first major release since I gave my talk at adaptTo(). Checkout the following links to get you started:

  1. AEM Solr Search Wiki. This is the new source of truth for our documentation.
  2. AEM Solr Search 2.0.0 demo on Youtube.

We hope you enjoy the new release. Drop us a line at aemsolr@headwire.com if you need help with your AEM / Solr integration.

In our next release, expect document processing support and integration with a product catalog. This release will coincide with my talk at CIRCUIT in July:  Advanced AEM Search - Consuming External Content and Enriching Content with Apache Camel.

pfSense + Netgear 108T Smart Switch + Cisco WAP321: Small Business Networking for the Home

pfSense + Netgear 108T Smart Switch + Cisco WAP321: Small Business Networking for the Home

Recently, I was fortunate to move into a new home that was pre-wired with Cat5 Ethernet. One of the first things I did was open my home's structured wiring closet; however, it looked like the previous owner did not take advantage of the wired network as there was no switch. I took this as a welcomed opportunity to take on a small networking project with the following goals:

  1. Provide Gigabit connectivity throughout the house,
  2. Segment the various devices into separate logical networks via VLANs, and
  3. Attempt to roll my own Gigabit router using pfSense. (I recently heard about pfSense on Security Now, episode #530.)
  4. Learn a bit VLANs and pfSense.
Read More

Supporting Multi-Term Synonyms in hybris 5.4 / Solr 4.6.1

Supporting Multi-Term Synonyms in hybris 5.4 / Solr 4.6.1

Recently, I was working with a client on a hybris 5.4 implementation and was asked to import their synonyms from their current platform. Easy enough, right? Wrong. The out-of-the-box synonym integration allows business users to define multi-term synonyms on the "from-side” in the hMC; however, at the time of this writing, Sol 4.x does not natively support multi-term synonyms on the from-side of the synonym mapping. For example, if we had a synonym on the from-side (i.e., classic gaming console) mapped to the "to-side" (i.e., nintendo entertainment system), the hMC would silently allow this synonym definition. However, this would have no effect on the Solr-side.

Read More

Building hybris Through a Proxy

Corporate IT organizations are notorious for making enterprise systems nearly unusable through their strict security policies. One such restriction is to prevent both ingress and egress traffic to the public Internet. So, what do you do when your enterprise application needs Internet access. In most cases, firewall requests for such access is strictly prohibited or a lengthy bureaucratic  process that puts your project’s timeline at risk. Most corporate IT organizations recognize the need for limited Internet access and deploy internal proxies. 

Most modern applications provide the ability to configure a proxy. Well, what happens when it doesn’t work? Either you go through the bureaucratic red tape and then hit the bar every evening until your request goes through, or you consume massive amounts of coffee and trace the piece of code that isn’t honoring the proxy. I opted for the latter when our hybris build refused to honor our proxy settings.

Read More

AEM 6 and Classic UI Mode

Well fellow AEM 6 developers, I have another tiny AEM gem for you. Are you currently working on an AEM 6 project but are only targeting the Classic UI? If so, here are two quick OSGi configuration changes that will keep you within the Classic UI interface.

To configure Classic UI as the default mode for AEM 6, configure the WCM Authoring UI Mode Service in the Felix console. Simply change the default authoring UI mode from TOUCH to CLASSIC

I am not a fan of runtime changes in the Felix console, so I recommend creating a run mode for your application. Your run mode should be define in /apps/<your application>/config/com.day.cq.wcm.core.impl.AuthoringUIModeServiceImpl.xml and should look as follows:

<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
    xmlns:jcr="http://www.jcp.org/jcr/1.0"    jcr:primaryType="sling:OsgiConfig"
    authoringUIModeService.default="CLASSIC"
    authoringUIModeService.editorUrl.classic="/cf#"
    authoringUIModeService.editorUrl.touch="/editor.html"
    />

I also recommend changing the default root mapping so that content authors are directed to the Welcome screen following login. As before, create a run mode for your application at /apps/<your application>/config/com.day.cq.commons.servlets.RootMappingServlet.xml:

<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
    xmlns:jcr="http://www.jcp.org/jcr/1.0" jcr:primaryType="sling:OsgiConfig"
    rootmapping.target="/welcome.html" />

I hope this helps!

AEM 6 and getServiceResourceResolver()

For those of you fortunate to work on an AEM 6 project, I am sure that you are starting to discover those subtle differences between AEM 5.x and 6.x. Well, I discovered my first painful difference while moving from ResourceResolverFactory.getAdministrativeResourceResolver() to ResourceResolverFactory.getServiceResourceResolver(). Other than the new requirement to configure Apache Sling Service User Mapper Service, the big difference is that you can't use the returned ResourceResolver for long running services anymore. If you do, you end up with a ResourceResolver that has a view of the repository at the time when the resource resolver was created. When switching over to ResourceResolverFactory.getServiceResourceResolver(), remember to close your resource resolver as soon as you're finished with it. 

The Javadocs indicate this, but in my experience most developers rarely take the time to read the documentation.

 

AEM/CQ Reverse Engineering Demystified

Have you ever struggled to find out how to implement a particular feature for Adobe AEM/CQ? Did your attempt to find what you needed on Google or discussion groups fail? In your gut, do you know it's possible to implementation the feature since AEM performs a very similar operation? If you answered yes to any of these questions, some degree for reverse engineering may do the trick.

Read More

AEM Solr Search Now Available

AEM Solr Search Now Available

I am happy to announce that AEM Solr Search is finally out in Beta! Visit http://www.aemsolrsearch.com/ and start integrating AEM with Apache Solr. Watch the video for a quick preview on building a rapid front-end search experience, then jump into the Getting Started guide and experiment with the Geometrixx Media Sample application.

Read More

adaptTo() 2014 - Integrating Open Source Search with CQ/AEM

I just received confirmation that I will be speaking at adaptTo() 2014. This session describes several approaches for integrating Apache Solr with AEM. It starts with an introduction to various pull and push indexing strategies (e.g., Sling Eventing, content publishing and web crawling). The topic of content ingestion is followed by an approach for delivering rapid search front-end experiences using AEM Solr Search. 

A quick start implementation of the search stack will be provided as part of this presentation. The quick start installer includes pre-configured instances of Apache Solr and Apache Nutch. This presentation will also include the source code for the Community Edition of headwire.com’s AEM Solr Search. AEM Solr Search is a suite of AEM search components and services designed to integrate with Apache Solr. 

There will be a hackathon session afterwards, so it would be great to see you in person.

A Step-by-Step Guide to Indexing CQ with Nutch

In my previous post, Integrating Apache Solr with Adobe CQ / AEM, I talked about the various Solr / CQ integration approaches. In this post, we will index the Geometrixx Media site using Apache Nutch. The integration described here is meant for those with little or no experience with Apache Solr and/or Apache Nutch. Since I am a Mac user all steps assume a UNIX environment. 

Read More

Integrating Apache Solr with Adobe CQ / AEM

Recently, I have been noticing a bit of interest by the CQ community regarding CQ / Solr integration. However, as most people have pointed out, there isn't a clear path detailed anywhere. Given the interest, I will be posting regularly on the subject. This first post will stay relatively high-level and discuss the possible integration points.

There are really two areas that should be considered when integrating Solr with CQ: indexing content and searching content. For the most part, you can treat these as two independent efforts.

Indexing CQ Content

Over the past 6 months I have experimented with multiple approaches to indexing CQ content in Solr. Each approach has its respective strengths and weaknesses.

  1. Crawl your site using an external crawler.
  2. Create one or more CQ servlets to serialize your content into a Solr JSON or Solr XML Update format.
  3. Create an observer within CQ to listen for page modifications and trigger indexing operations to Solr.

Using an External Crawler

Using an external crawler such as Nutch or Heritrix is perhaps the simplest way to start indexing your CQ content; however, it does have its drawbacks. Using a crawler involves working with unstructured content in the form of mainly HTML documents. While most crawlers do a decent job extracting the content body, title, url, description, keywords and other metadata, you typically need to define a strategy for extracting other useful data points to drive functionality such as faceting. Extracting this information can be achieved in several ways: use an external document processing framework (recommended), use Solr's Update Request Processor (not recommended), use Solr's tokenizers for basic extraction, etc.

The other drawback with this approach is that it uses a pull approach to indexing content. There are ways around this; however, using a crawler typically means that you will be sacrificing real-time indexing.

CQ Servlets & Solr Update JSON/XML

Another possible approach is to create one or more CQ servlets that produces a dump of your CQ content using Solr's Update JSON or Update XML format. The advantage here is that you are working with structured content and have full access to CQ's APIs for querying JCR content. An external cron job can then be used to fetch this page using curl and post it to Solr.

A variation of this approach is to use a selector to render a page in either the Solr JSON or XML update format. 

CQ Observer

Using a CQ observer provides the tightest integration with Solr and as such provides real-time indexing capabilities. Like the CQ Servlet approach, it simplifies content extraction since you are working with structured data. There are several methods for implementing an observer. Refer to Event Handling in CQ by Nicolas Peltier. My personal preference is listening to Page Events and Replication Events using Sling Eventing. In this approach once you receive an event, such as page modification, you can use the SolrJ API to update the Solr index.

Searching CQ

Once you have your CQ content indexed in Solr you will need a search interface. While there are several approaches for building search experiences against Solr, the most popular approach is to use Solr's Java API, SolrJ. For client-side integration, ajax-solr is a great choice.

Lastly, I need to shamelessly plug an upcoming integration for CQ and Solr by headwire.com, Inc, aptly named CQ Solr Search. This integration offers support for building search interfaces using search components built on ajax-solr as well as a configurable CQ observer for real-time Solr indexing. We will be introducing the first public implementation on CQ Blueprints. Our intent is to provide one place for searching all CQ/Sling/JCR content on the web.

Upcoming

Based on the community feedback, please stay tuned for the following. 

  1. CQ Solr Search by headwire.com, Inc. - (Not yet available)
  2. A Step-by-Step Guide to Indexing CQ with Nutch (Coming soon)
  3. A Steb-by-Step Guide to Indexing CQ with CQ Servlets (Coming soon)
  4. A Step-by-Step Guide to Indexing CQ using an Observer (Coming soon)