Enterprise Content Management

Armedia Blog

Finding Similar Documents Without a Full Text Index

January 27th, 2015 by Scott Roth

Is there a way to quickly find similar documents in a Documentum repository? Yes, there is. One approach could be to use the Lucene MoreLikeThis() API. This API call to the Lucene full text search engine extracts what it believes to be the most salient words from a document and runs a full text search looking for documents whose content matches the chosen query words. But what if there was a simpler, lighter-weight approach?

In my 2014 EMC Knowledge Sharing whitepaper, Finding Similar Documents Without Using A Full Text Index, I detail an approach for identifying similar documents in a Documentum repository by using a 64-bit hash value. This hash value, called the Similarity Index (SI), is a product of a hashing function named SimHash[1]. This 64-bit value is applied with an Aspect to an object as metadata. This hash value can then be queried to find content that is similar to a given document’s Similarity Index. For example, you could execute a DQL query like this to discover content that shares 80% similarity with a selected document:

select similar_obj_id from dbo.si_view where r_object_id ='[r_object_id]' and similarity >= 0.80

Where [r_object_id] is the object ID of a known object.

Using queries like this, content can be discovered which meets a varying degree of similarity. In this example, the query would return any document which is 80% similar to the selected document. For finer results, you could query for content which has 90% similarity.

The details for implementing this solution are discussed in the whitepaper. The most interesting elements of the solution are the SimHash function itself, and the relationship between the Aspect, which stores and evaluates the SI, and a registered database view that makes searching possible.

If you are intrigued, I encourage you to download the whitepaper.

[1] Moses Charikar, http://www.cs.princeton.edu/courses/archive/spr04/cos598B/bib/CharikarEstim.pdf

VIDEO: Armedia Case Manager – Editing Your User Profile

January 20th, 2015 by Allison Cotney

We have uploaded a new video to the Armedia YouTube Channel! In this video we demonstrate how users can update or change their profile information within Armedia Case Manager.

These changes could include items such as groups that the user belongs to or subscriptions the user may have. And of course, the user profile picture is always customizable.



Stay tuned for more Armedia Case Manager Videos!!

VIDEO – Armedia Case Manager: Generating a Report

January 16th, 2015 by Allison Cotney

Check out our new video blog giving you an inside look at Armedia Case Manager!

In this post, Ronda Ringo demonstrates how users can generate a report within Armedia Case Manager.

Stay tuned for more videos coming soon! To see all of our Armedia Case Manager Videos, CLICK HERE.

VIDEO: Armedia Case Manager: The Dashboard

January 15th, 2015 by Allison Cotney

In today’s video, we give you a tour of the Armedia Case Manager configurable dashboard. This dashboard provides a quick an easy way for users to access commonly needed components of their case management solution.

The dashboard is also customizable so that users can put things in an order that makes sense for their needs.

Stay tuned for more Armedia Case Manager videos coming next week!!

Create a Self-Extracting Installer in Linux

November 19th, 2014 by Paul Combs

Ultimately many would want to write complex RPMs to install software packages, however, those whom are accountable for writing such packages may agree that this task may be cumbersome and impractical. Where the RPM may accomplish the goal of wrapping an application in some sort of container for distribution, this goal is also possible by using a self-extracting archive or an installer which can launch an embedded script. This is where a utility called makeself comes in. Makeself is described as “a small shell script that generates a self-extractable tar.gz archive from a directory. The resulting file appears as a shell script (many of those have a .run suffix), and can be launched as is.

Install makeself

cd /opt/app
wget http://megastep.org/makeself/makeself-2.1.5.run
chmod 755 makeself-2.1.5.run
cd makeself-2.1.5
cp *.sh /usr/bin

Suppose you want to package and distribute a version of libevent2 for several CentOS 6 servers. Here is one way.

mkdir rpmfile
cd rpmfile

wget ftp://ftp.pbone.net/mirror/ftp5.gwdg.de/pub/opensuse/repositories/home:/aevseev/CentOS_CentOS-6/x86_64/libevent2-2.0.21-1.1.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/ftp5.gwdg.de/pub/opensuse/repositories/home:/aevseev/CentOS_CentOS-6/x86_64/libevent-devel-2.0.21-1.1.x86_64.rpm

echo '#!/bin/bash
yum install -y libevent2-2.0.21-1.1.x86_64.rpm libevent-devel-2.0.21-1.1.x86_64.rpm' > libevent-install.sh
chmod +x libevent-install.sh
cd ..


makeself.sh ./rpmfile ./libevent-devel.run "SFX installer for libevent-devel (2.0.21)" ./libevent-install.sh

Use the .run file



CMIS Integration – Integrating FileNet with SharePoint 2013

October 17th, 2014 by Ben Chevallereau

Recently, our team has been working on a series of CMIS Integrations. This video demonstrates the use of the CMIS components that we developed and used to integrate FileNet with SharePoint 2013. This integration has been packaged into SharePoint. During the video, you’ll see how to connect to FileNet, to browse the repository, to create folder, to create documents and as well to preview documents and to download documents.

Predictive Analytics and The Most Important Thing

October 15th, 2014 by Jim Nasr

It turns out I was wrong…which happens at an alarmingly increasing rate these days—though I chalk that to a thirst to challenge myself…errr, my story!

So, for a while now, I had convinced myself that I knew what the most important thing was about successfully doing predictive analytics: accuracy of data and the model (separating the noise). Veracity, as they say. In working with a few clients lately though, I no longer think that’s the case. Seems the most important thing is actually the first thing: What is the thing you want to know? The Question.

As technologist we often tend to over-complicate and possibly over-engineer. And it’s easy to make predictive analytics focus on the how; the myriad of ways to integrate large volumes and exotic varieties of data, the many statistical models to evaluate for fit, the integration of the technology components, the visualization techniques used to best surface results, etc. All of that has its place. But ultimately, first, and most importantly, we need to articulate the business problem and the question we want answers for.

What do we want to achieve from the analytics? How will the results help us make a decision?

Easy as that sounds, in practice it is not particularly easy to articulate the business question. It requires a real understanding of the business, its underlying operations, data and analytics and what would really move the meter. There is a need to marry the subject matter expert (say, the line of business owner) with a quant or a data scientist and facilitate the conversation. This is where we figure out the general shape and size of the result and why it would matter; also, what data (internal and external) feeds into it.

Articulating The Question engages the rest of the machinery. Answers are the outcome we care about. The process and the machinery (see below for how we do it) give us repeatability and ways to experiment with both asking questions and getting answers.

Armedia Predictive Analytics Process

Armedia Predictive Analytics process for getting from The Question to answers

VIDEO- Alfresco CMIS Integration- A Sneak Peak

September 19th, 2014 by Ben Chevallereau

For a few months now, myself and our fellow team members have been working on developing a CMIS integration to seamlessly allow for Alfresco to be accessed from other platforms. This video demonstrates the components that we built on top of the standard CMIS 1.1 and that we packaged in different platforms like Sharepoint 2013, Sharepoint 2010 or Drupal. Using these components, you can browse your repository, find your documents, upload documents by drag and drop, edit-online or use full text or advanced search. This video focus essentially on the integration with Alfresco, but it can be used with any CMIS 1.1 compliant repository. Complimentary to these components, we created as well a filtered search component. This one is only compatible with Alfresco but with any versions. Use this component, you can use a full text search and filter the result using metadata like file type, creator, creation date or file size.

These components have been built only with JS, HTML and CSS files, so it’s why it’s so easy to repackage in other web platforms. Moreover, we really built it to make them highly customizable. Depending of your use case, you can customize these components to display relevant metadata, to focus on a specific folder, to add new filter and a lot more.



For more information about our CMIS integration with Alfresco, Join us next week in San Francisco for Alfresco Summit 2014!

CLICK HERE to register for this event.

Spring Managed Alfresco Custom Activiti Java Delegates

September 17th, 2014 by Judy Hsu

I recently needed to make a change to have Alfresco 4’s Activiti call an object managed by Spring instead of a class that is called during execution.  Couple of reasons for this:

  1. A new enhancement was necessary to access a custom database table, so I needed to inject a DAO bean into the Activiti serviceTask.
  2. Refactoring of the code base was needed.  Having Spring manage the java delegate service task versus instantiating new objects for each process execution is always a better way to go, if the application is already Spring managed (which Alfresco is).
    • i.e. I needed access to the DAO bean and alfresco available spring beans.
    • NOTE:  You now have to make sure your class is thread safe though!

For a tutorial on Alfresco’s advanced workflows with Activiti, take a look at Jeff Pott’s tutorial here.  This blog will only discuss what was refactored to have Spring manage the Activiti engine java delegates.

I wanted to piggy-back off of the Activiti workflow engine that is already embedded in Alfresco 4, so decided not to define our own Activiti engine manually.  The Alfresco Summit 2013 had a great video tutorial, which helped immensely to refactor the “Old Method” to the “New Method”, described below.


For our example, we’ll use a simple activiti workflow that defines two service tasks, CherryJavaDelegate and ShoeJavaDelegate (The abstract AbstractCherryShoeDelegate is the parent).  The “Old Method” does NOT have spring managing the Activiti service task java delegates.  The “New Method” has spring manage and inject the Activiti service task java delegates, and also adds an enhancement for both service tasks to write to a database table.

Old Method

1. Notice that the cherryshoebpmn.xml example below is defining the serviceTask’s to use the “activiti:class” attribute; this will have activiti instantiate a new object for each process execution:

<process id="cherryshoeProcess" name="Cherry Shoe Process" isExecutable="true">
    <serviceTask id="cherryTask" name="Insert Cherry Task" activiti:class="com.cherryshoe.activiti.delegate.CherryJavaDelegate"></serviceTask>
    <serviceTask id="shoeTask" name="Insert Shoe Task" activiti:class="com.cherryshoe.activiti.delegate.ShoeJavaDelegate"></serviceTask>

2. Since we have multiple service tasks that need access to the same Activiti engine java delegate, we defined an abstract class that defined some of the functionality.  The specific concrete classes would provide / override any functionality not defined in the abstract class. 

import org.activiti.engine.delegate.JavaDelegate;
public abstract class AbstractCherryShoeDelegate implements JavaDelegate {
    public void execute(DelegateExecution execution) throws Exception {

public class CherryJavaDelegate extends AbstractCherryShoeDelegate {

New Method

Here’s a summary of all that had to happen to have Spring inject the java delegate Alfresco 4 custom Activiti service tasks (tested with Alfresco 4.1.5) and to write to database tables via injecting DAO beans.

  1. The abstract AbstractCherryShoeDelegate class extends Activiti engine’s BaseJavaDelegate
  2. There are class load order issues where custom spring beans will not get registered.  Set up depends-on relationship with the activitiBeanRegistry for the AbstractCherryShoeDelegate abstract parent
  3. The following must be kept intact:
    • In the Spring configuration file, 
      • Abstract AbstractCherryShoeDelegate class defines parent=”baseJavaDelegate” abstract=”true” depends-on=”ActivitiBeanRegistry”
      • For each concrete Java Delegate:
        • The concrete bean id MUST to match the class name, which in term matches the Activiti:delegateExpression on the bpmn20 configuration xml file 
          • NOTE: Reading this Alfresco forum looks like the activitiBeanRegistry registers the bean by classname, not by bean id, so likely this is not a requirement
        • The parent attribute MUST be defined as an attribute

Details Below:

1. Define spring beans for the abstract parent class AbstractCherryShoeDelegate and each concrete class that extends AbstractCherryShoeDelegate (i.e. CherryJavaDelegate and ShoeJavaDelegate). Have Spring manage the custom Activiti Java delegates where the concrete class.  The abstract parent must define it’s own parent as “baseJavaDelegate”, abstract=”true”, and depends-on=”ActivitiBeanRegistry”.

<bean id="AbstractCherryShoeDelegate" parent="baseJavaDelegate" abstract="true" depends-on="activitiBeanRegistry"></bean>
<bean id="CherryJavaDelegate"
class="com.cherryshoe.activiti.delegate.CherryJavaDelegate" parent="AbstractCherryShoeDelegate">
    <property name="cherryDao" ref="com.cherryshoe.database.dao.CherryDao"/>

<bean id="ShoeJavaDelegate"
class="com.cherryshoe.activiti.delegate.ShoeJavaDelegate"  parent="AbstractCherryShoeDelegate">
    <property name="shoeDao" ref="com.cherryshoe.database.dao.ShoeDao"/>


- Do NOT put any periods to denote package structure in the bean id!  Alfresco/Activiti got confused by the package “.”, where spring normally works fine with this construct.

- Also just because the concrete class is extending the parent abstract class, is not enough to make it work.

<bean id="com.cherryshoe.activiti.delegate.CherryJavaDelegate"
class="com.cherryshoe.activiti.delegate.CherryJavaDelegate" >
    <property name="cherryDao" ref="com.cherryshoe.database.dao.CherryDao"/>

<bean id="com.cherryshoe.activiti.delegate.ShoeJavaDelegate"
class="com.cherryshoe.activiti.delegate.ShoeJavaDelegate" >
    <property name="shoeDao" ref="com.cherryshoe.database.dao.ShoeDao"/>

2. Notice that the cherryshoebpmn.xml example below is using the “activiti:delegateExpression” attribute and referencing the Spring bean.  This means only one instance of that Java class is created for the serviceTask it is defined on, so the class must be implemented with thread-safety in mind:

<process id="cherryshoeProcess" name="Cherry Shoe Process" isExecutable="true">
    <serviceTask id="cherryTask" name="Insert Cherry Task" activiti:delegateExpression="${CherryJavaDelegate}"></serviceTask>

    <serviceTask id="shoeTask" name="Insert Shoe Task" activiti:delegateExpression="${ShoeJavaDelegate}"></serviceTask>

3.  The abstract class is now changed to extend the BaseJavaDelegate.  The specific concrete classes would provide / override any functionality not defined in the abstract class. 

import org.alfresco.repo.workflow.activiti.BaseJavaDelegate;
public abstract class AbstractCherryShoeDelegate extends BaseJavaDelegate {
    public void execute(DelegateExecution execution) throws Exception {

public class CherryJavaDelegate extends AbstractCherryShoeDelegate {

For more examples and ideas, I encourage you explore the links provided throughout this blog. Also take a look at Activiti’s user guide, particularly the Java Service Task Implementation section. What questions do you have about this post? Let me know in the comments section below, and I will answer each one.

The blog Spring Managed Alfresco Custom Activiti Java Delegates was originally posted on cherryshoe.blogspot.com.

U. S. Government Digital Acquisition Policy Gets an Update

September 11th, 2014 by Scott Roth

You may have seen the news that the U. S. Government has established the U.S. Digital Service, a small team designed to “to improve and simplify the digital experience that people and businesses have with their government.” On the heels of that announcement came the news that Michael Dickerson, former Google engineer, has been selected to head up the U. S. Digital Service. And, in conjunction with these announcements, came some initial updates to the U. S. Government’s acquisition policies as they relate to software and computing solutions. It is these updates I would like to highlight in this post.

These initial updates come in the form of two documents, The Digital Services Playbook and the TechFAR , which really go hand-in-hand. The Playbook lays out best practices for creating digital services in the government, and the TechFAR describes how these services can be acquired within the confines of existing acquisition policy (i.e., the FAR ). The Playbook discusses 13 “plays”, or best practices that should be implemented to ensure delivery of quality applications, websites, mobile apps, etc., that meet the needs of the people and government agencies.  Advocating and implementing these plays will be the Digital Services’ mission.  As a long-time provider of software development services, I wasn’t too surprised by any of these best practices – and neither will you. However, it was refreshing to see the government finally embrace and advocate them.  Here are the Digital Services Playbook plays.

  1. Understand what people need
  2. Address the whole experience, from start to finish
  3. Make it simple and intuitive
  4. Build the service using agile and iterative practices
  5. Structure budgets and contracts to support delivery
  6. Assign one leader and hold that person accountable
  7. Bring in experienced teams
  8. Choose a modern technology stack
  9. Deploy in a flexible hosting environment
  10.  Automate testing and deployments
  11. Manage security and privacy through reusable processes
  12. Use data to drive decisions
  13. Default to open

Like I said, you probably weren’t surprised by these practices, in fact, if you are a successful software services company, you probably already implement these practices. But remember, these practices are now being embraced by the U. S. Government, whose acquisition policy has traditionally been geared more toward building battleships than software solutions.

Speaking of acquisition, the TechFAR is a handbook that supplements the Federal Acquisition Regulations (FAR). The FAR is a strict and lengthy body of regulations all executive branch agencies must follow to acquire goods and services. The Handbook is a series of questions, answers, and examples designed to help the U. S. Government produce solicitations for digital services that embrace the 13 plays in the Digital Services Playbook. At first glance, you may not think that implementing these practices would require a supplement like the Handbook, but if you have any experience with the FAR, or agencies who follow it, you will understand that interpretation and implementation of the regulations varies from agency to agency, and they usually error on the side of caution (i.e., strict interpretation of the policy).

In my experience, the single most difficult thing for a U. S. Government agency to accomplish under the FAR is play #4, the use of agile methodologies to develop software solutions. If you can accomplish this, many of the other plays will happen naturally (e.g., #1, #2, #3, #6, #7, #10). However, the nature of agile development – user stories vs. full system requirements, heavy customer participation vs. just follow the project plan, etc. – seems contrary to the “big design” methodology implied by the FAR. This notion couldn’t be more wrong. The TechFAR encourages the use of agile methodologies and illustrates how solicitations and contracts can be structured to be more agile.

Personally, I think the Digital Services Playbook and the TechFAR are a great starting point for improving the quality and success of government software solutions.  And, official guidance like this now brings the U. S. Government’s acquisition process inline with how Armedia has always developed software solutions, i.e., using agile methodology.  No longer will we have to map our methodology and deliverables to an archaic waterfall methodology to satisfy FAR requirements.

I think the questions/answers/examples in the TechFAR are good, and provide terrific insight for both the government writing solicitations, and industry responding to them. If you sell digital services to the U. S. Government, I encourage you to read these two documents, the Digital Services Playbook and the TechFAR  — they’re not long. And even if you don’t contract with the U. S. Government, the best practices in the Playbook and the advice in the Handbook are still probably applicable to your business.

Copyright © 2002–2011, Armedia. All Rights Reserved.