Enterprise Content Management

Armedia Blog

How to Export Tabluar Data in Captiva 7

July 29th, 2014 by Scott Roth

Armedia has a customer using Captiva 7 to automatically capture tabular information from scanned documents. They wanted to export the tabular data to a CSV file to be analyzed in Excel. Capturing the tabular data in Captiva Desktop proved to be simple enough, the challenge was in exporting it in the desired format.  Our customer wanted each batch to create its own CSV file, and that file needed to contain a combination of fielded and tabular data expressed as comma delimited rows.

Here is an example of one of the scanned documents with the desired data elements highlighted.

Timecard-Scan-Metadata-Highlighted

Here is an example of the desired output.

EMPLOYEE,EID,DATE,REG HRS,OT HRS,TOT HRS
ANDREW MARSH,084224,4/22/2013,7,0,7
ANDREW MARSH,084224,4/23/2013,7.5,1,7.5
ANDREW MARSH,084224,4/24/2013,4,0,9
ANDREW MARSH,084224,4/25/2013,8.5,0,8.5
ANDREW MARSH,084224,4/26/2013,12,0,12
BARB ACKEW,084220,4/22/2013,7,0,7
BARB ACKEW,084220,4/23/2013,9.5,0,9.5
BARB ACKEW,084220,4/24/2013,9.5,0,9.5
BARB ACKEW,084220,4/25/2013,2.5,0,2.5
BARB ACKEW,084220,4/26/2013,8,.5,8

As you can see, the single fields of Employee Name and Employee Number are repeated on each row of the output.  However, because Employee Name and Employee Number were not captured as part of the tabular data on the document, this export format proved to be a challenge.

Here’s what I did:

  1. In the Document Type definition, I created fields for the values I wanted to capture and export (Name, EmployeeNbr, Date, RegHrs, OTHrs, TotHrs).  Here’s how it looks in the Document Type editor:

CC7-doctype

  1. In the Desktop configuration, I configured:
    • Output IA Values Destination: Desktop
    • Output dynamic Values: checked
    • Output Array Fields: Value Per Array Field
  2. Finally, I created a Standard Export profile that output the captured fields as a text file, not a CSV file. I named the file with a “CSV” extension so Excel could easily open it, but to create the required output format, the file had to be written as a text file.  Here is what the Text File export profile looks like:

CC7-export

The content of the Text file export profile is:

EMPLOYEE,EID,DATA,DATE,REG HRS, OT HRS, TOT HRS
---- Start repeat for each level 1 node ----
---- Start repeat for each row of table: Desktop:1.UimData.Hours ----
{S|Desktop:1.UimData.Name},{S|Desktop:1.UimData.EmployeeNbr},{S|Desktop:1.UimData.Date},{S|Desktop:1.UimData.RegHrs},{S|Desktop:1.UimData.OTHrs},{S|Desktop:1.UimData.TotHrs}
---- End repeat ----
---- End repeat ----

By using two nested loops I was able to access the non-tabular fields, Name and EmployeeNbr, as well as the tabular fields in the same output statement.  This looping feature of the Text File export profile saved having to write a CaptureFlow script to iterate through all the table variables and concatenate Strings for export.  A nice feature, but not well documented.

Good Times With VirtualBox Networking

July 24th, 2014 by David Milller

TL;DR version: if you run multiple VirtualBox VMs on the same desktop, setup 3 network interfaces on each such VM (one NAT, one internal, one bridged).

Now for the long, more entertaining (hopefully!) version:

Recently I switched from VMware Workstation to Oracle VirtualBox for my personal virtualization needs.   I’m very happy overall. VirtualBox seems faster to me – when I minimize a VM, do lots of other work, then restore the VM, it is responsive right away; where vmWare would page for a minute or two.  And each VirtualBox VM is in a separate host window, which I like more than VMware’s single tabbed window.

Still, I must say VMware’s networking was easier to deal with.  Here’s how I ended up with 3 IP addresses in each of my local VMs…

I have a CentOS VM running Alfresco and Oracle; a Fedora VM running Apache SOLR and IntelliJ IDEA; and a Windows 2012 Server VM running Active Directory.  I need connectivity to each of them from my host desktop (Windows 8.1), and they need connectivity to each other, and they need to be able to connect to Armedia’s corporate VMs.  Plus,  I’d rather not update my hosts file or IP settings every time  I move between the office and home!

1st VirtualBox network: a Network Address Translation (NAT network) which allows each VM to talk to the other VMs, but not to any other machine; and does not allow connection from the host desktop.  This meets Goal #2 (connectivity to each other).  But Goals #1 and #3 are not met yet.

2nd VirtualBox network: a VirtualBox Host-Only network which allows connectivity from the host desktop.  Now Goals #1 (connectivity from the host) and #2 (connectivity to each other) are just fine.

Also, both the NAT and the host-only network offer stable IP addresses; whether at home or at work, my VM’s get the same address each time, so I don’t spend 10 minutes updating IP references every time I switch location.

Danger!  Here is where VirtualBox tricks you!  It seems like Goal #3 (access to corporate VMs) is met too!  With the NAT and internal IP addresses, I can see our internal websites and copy smaller files to and from the data center VMs.  But if I transfer a larger file, I get a Connection Reset error!  Twice in the last month, I’ve spent hours tracking down the “defect” in the corporate network settings.  (You’d think I’d remember the problem the second time around; but in my defense the error manifested in different ways).

Solution?  Add the 3rd VirtualBox network: a bridged network (i.e. bridged to your physical network adapter, so this network causes each VM to have an IP address just like the host gets, from the corporate/home DHCP server): Now the 3rd goal is really met!  I can transfer files all day long, no worries.

Something to watch out for: when you disconnect a wired ethernet cable, VirtualBox automatically changes the bridged network to bind to your wireless interface.  This is nice since your VMs automatically get new addresses.  BUT! When you plug in the ethernet again (which in my case deactivates the wireless), VMware does NOT switch back to the wired interface!  That happened to me this morning.  Spent a few hours trying to figure out why my file uploads failed.  Finally saw where VirtualBox re-bound my bridged network.  Changed it back to the wired interface, and all was well.

Alfresco Records Management: An Approach to Implementation Part II

July 22nd, 2014 by Deja Nichols

In the 1st part of this blog “Alfresco Records Management; An Approach to Implementation – PART 1,” I went over the business case and planning phase for a medium sized agency that wanted a seamless records management configuration, leveraging Alfresco’ s Enterprise Content Management (ECM) system and Records Management (RM) Module.

To figure out how we wanted to go about design and implementation and how to configure the system properly, we need to get an idea of the basic lifecycle of our documents and records. We needed to see where we were going. To build a castle, you need to know how much total space, land you need etc. What are all the materials you need? What is it going to cost?  Even if it’s just a general idea, it’s best to map out what you want, what is required for the whole project first. You can’t just start out with one room of a castle and “see where it takes you.” I have personally seen that it is the same with building ECM and RM systems. Different documents can have different life cycles but here is a general example for a possible lifecycle for an HR Complaint:

HR_Document_Lifecycle_Example

 

 

In this blog, Part 2, I’m going to go over our last two general aspects, how we set up and implemented Alfresco in order to accomplish our ideal records management configuration:

  • Configuration
  • Implementation

Steps for Creating a Records Management Program

 

In order to best describe our configuration and implementation phase, I want to go over some very basic aspects of how things were set up in Alfresco. Although we had an older version of Alfresco, most of this was out of the box with little configuration. So here’s the basic aspects that we created in Alfresco that was important to the layout of the system:

  1. Group sites
  2. Document library within each group site
  3. Document types
  4. Metadata
  5. Record Declaration
  6. Seamless User Interaction
  7. Records Management (RM) Module  (aka RM repository or File Plan)
  8. Workflows for certain documents and records

Alfresco Group Sites Example

 

 

Alfresco Group sites:

To break it down let’s start with the basic structure of our company. Like most companies, we have a hierarchical structure, about seven different departments and about 200 employees. Every employee belongs to one department, so we set up each department with a “group site” in Alfresco. (Human Resources Site, Finance Site, Legal Site etc…this is an out of the box feature in Alfresco)

Alfresco Document Library:

Each department group site has its own file repository. In Alfresco it is called a “Document Library,” which, per our records policies, was deemed to be the single-source repository for all of that departments’ electronic documents.

Document types:

Each document library can be set up with a unique set of “Document Types” to categorize documents into your file taxonomy. They can also be unique per group sites’ document library. (For example, the Human Resources document library may have “Employee Contracts” and “Resumes” as two possible document types, but Finance may have “Vendor Contracts” and “Invoices” etc.)

The idea was that, upon an employee uploading a document to their department’s document library, they were prompted to select a document type. You can also set up a sub-document type, if that was necessary per the retention schedule or file taxonomy.

Metadata:

Then, we configured the system to require the user to enter in any applicable metadata for the document they are uploading. (As required by our documents matrix in PART 1). Some of our documents needed to have extra properties (metadata) to help with mapping it to the correct location for retention purposes. Example, for a document type “Resume” we wanted to add the following metadata: “Name of Employee” so that the system knew which records management folder to put it in (which I will go over in more detail later in this blog). Each record uploaded typically only needed one extra piece of information to correctly categorize it for records management purposes.

Alfresco Records Management Home Page

 

Record Declaration

The last step when uploading a document that we configured the system to be able to handle was to declare the document an official record, which only showed up on document types that had a retention policy associated with it. If they chose a document type that was predetermined to be a record (as opposed to a non-record), then the user was given the option to choose whether or not the document they were uploading should be declared as an official record. What this means is that if this document was “complete” and ready as an “official record,” then checking that box would immediately declare it an official record upon ingestion. But if the document was still a “work in progress,” and not yet an official “record,” the user could simply leave the box unchecked and declare the document an official record at a later date, when it is fully completed. (Example: If the document type is a “Contract,” and it is still being worked on when it was uploaded, they would not check the “declare record” box, upon ingestion. But, then after some time, when that contract gets officially signed, they can declare it a record at that time).

For us, the word “Complete” was defined per document by the document matrix. Basically it is “when a document is considered complete” and/or “at which point it becomes official record.” For us, one example of the declaration criteria was: “After the document (in this case, a contract) was officially approved AND all stakeholders had signed-off on it, it could be declared a record.” For some records this was not applicable, such as articles of incorporation, bills, financial statements etc. These were automatically official records upon ingestion and immediately sent to the RM module for retention regardless, since they were un-editable documents. So obviously anything of that document type was given the option to “declare it a record,” and it was automatically declared after ingestion.

Process for Declaring a Record

 

In most cases, we found the user usually knows what they are uploading. They usually upload their own work into the system and they usually know if that work is still “in progress” or “complete” etc.  We also found that it was not even necessary to teach users the document matrix because most of them knew what they were working on like the back of their hand. Thus, this method worked for us and we did not have to turn our end users into Records Managers! They only needed to know 3 basic things:

1. What the document type was (invoice, contract, financial report)

2. What the metadata was (date of document or name of employee, etc., usually only one piece of extra information was needed)

3. Is it still being edited or otherwise worked on, or can it be declared a record now?

Seamless User Interaction:

We wanted our users to be able to see the records in their own context. What I mean is, we did’t want them having to go look for their documents in 2 places. We did’t want them to have to worry or even know about the RM module in Alfresco. All they needed to know was that they upload documents to their group site and the RM works behind the scenes. So we set it up so that if they are searching for documents on their group site, documents that were sent to the RM module (from that site) also show up in the search results. They can open it and view it, collaborate on it without ever leaving their group site. (You can also set up a visual indicator on each document as an “official record” so you can tell which ones have been sent.)

Alfresco Records Management In Place Records Declaration

 

Alfresco Records Management Module:

When someone sets up the Alfresco RM module it will allow them to create folders in what is called a “File Plan.” Then the Records Manager can set retention rules (that coincide with the retention schedule) on those folders. From there, the documents can be mapped to the File Plan folders (using the documents matrix as a guide) when a document type and metadata combo is placed on a document and then declared a record. That File Plan folder runs the retention on the documents from there.

Alfresco Records Management Module File Structure Example

 

When the user selects “Yes” to the question “declare official record?” (whether upon ingestion or later declared), it tells the system that this file can now be sent directly to the File Plan in the Alfresco RM Module.

Example:

Now let’s take a look at a practical example of how a file gets uploaded into the system and ends up in the file plan and retention policies applied to it. (The characters indicated in green will be our input variables.)

Actor: Wants to upload an old invoice they found into the Finance group site. First enters in the document type: “Invoice.” Next pops up, because “invoice” was selected, the required metadata for the document type “Invoice” which was configured to be “Year.” Actor enters in a year: “2007.” Since “Invoice” doc type has a retention period connected to it, per configuration, the actor sees the checkbox up for “Official Record?.” Actor chooses to “check” the box which = true (or Yes).  Computer: from this input, knows exactly:

  • Where to put this file (was configured to : RM Site/File Plan/Finance/Invoices/2007)
  • When to put it there (was configured to : “Declare Official Record?” yes = immediately an official record = sent to file plan immediately)
  • How long it stays there (Document was placed in the Invoice folder. This file will keep records for current year + 6 years per the rules placed on it by the Records Manager. Also since it was placed in the “2007” folder, the system knows when to start the retention) [current year + 2007 + 6 = 2014]…this document is discarded on Jan 2014.)

 

Alfresco Records Management- Uploading a Document

 

If a document is not ready to be declared an official record upon ingestion (if you are still editing a contract for example) then one can keep it in the system and declare it a record when it is ready/complete/approved etc.

Typical_RM_Lifecycle

 

This diagram (above) shows the flow of a typical record lifecycle. Flow: upload, assign doc type and metadata, if not yet record – edit/collaborate, later declare record, retain and discard (if applicable), etc. Upon upload, the document has its entire life already mapped out for it, depending on the configuration of the document types, metadata and file plan. (Please note, there are some official records that are never discarded and have a “Permanent” retention. The file plan can accommodate for these types of files as well and the above model would need to be slightly modified to account for that. You don’t even need to ask if it’s an official record or not on permanent records as discussed earlier in this blog.)

Workflows:

Another popular option when declaring a record is to put the document through a workflow that takes it through an approval process, and once the document gets approved via the workflow, it automatically declares itself a record. The system, from that point on, knows all the information it needs in order to retain it, and if applicable, dispose of it per policy.

The creation of workflows is an important step to know what kind of workflows your company needs for records and documents/files, etc. This may be intimately connected to the lifecycle and management of your records so I suggest keeping it in mind when mapping out your system. For more information from Armedia about Workflows in Alfresco, see our Blog Subsection on Alfresco Workflows.

In closing:

This approach is primarily a “day forward” solution. When it comes to migration, there may need to be a different approach so files can be ingested into the new system and arrive at the  correct location within Alfresco.

Also I would like to note that this approach might work best with a customized user interface for more flexibility.

There are many different ways to go about implementing Records Management, and companies need flexible customization that will work for their business processes and records management needs. This method can help you get started on your own configuration and implementation.

For more information on Records management, check out our white paper: “Records Management: An Approach to Getting Started”

To read more Armedia Blogs about Alfresco, See these links: Alfresco Records Management, Alfresco ECM

Exciting News for Blind or Visually Impaired People Using SharePoint

July 17th, 2014 by Doug Loo

AFB Consulting (AFBC), the consulting division of the American Foundation for the Blind (AFB), conducted a comprehensive accessibility and usability evaluation of the Discover 508 for SharePoint software from Discover Technologies of Reston VA. The product evaluation compared Out of the Box (OOTB) SharePoint in SP 2010 and SP O-365 with Discover 508 for SharePoint, a software solution designed to make SharePoint easier to use. AFBC tested and compared these products based on how well they interact with various screen reader software products used by people with vision loss to access Windows computers.

Testing results clearly illustrate that Discover 508 has significant usability advantages over the out of the box experience in both environments. The usability and accessibility advantages allow a blind or visually impaired person to complete tasks much more easily and quickly than in the OOTB environment. The superior usability comes largely from a more intuitive, well-designed architecture that is easier to navigate and more suitable for efficiently accomplishing tasks. It also lacks inaccessible pop-ups and other features designed with only sighted users in mind. A clear and easy to use set of instructions provided by Discover 508 is another significant advantage over the OOTB experience, providing step-by-step guidance and allowing a beginner user to learn how to use the system.

Discover 508 provides an environment that is more suitable for use with screen readers, with a markup that includes among other things, properly coded headings, properly labeled links and form elements, and properly formatted and tagged tables. Properly tagged headings allow a person using a screen reader to quickly navigate to the headings that indicate the important sections of the page, and it also allows screen reader users to get a better concept of the overall layout and logical hierarchy of the page. Discover 508’s properly labeled form elements let screen reader users determine things like a particular type of edit field, such as Document Title or Date. They also help make combo boxes, check boxes, and radio buttons easier to use. Discover 508 also avoids the use of poorly formatted and tagged tables experienced in the OOTB environment.

With the Discover 508 for SharePoint solution, AFB testers found it substantially easier to manage calendar events, upload and edit documents and collaborate with team members. Time spent learning the system and completing individual tasks was significantly shorter when using Discover 508.  While SharePoint has made progress with their “More Accessible” mode, Discover 508 clearly stands out as the more accessibility usable solution.   Although testers could eventually complete most tasks  attempted in the out of the box environment, there were some inaccessible tasks that  could not be completed without sighted assistance. The level of frustration and confusion was also significant. For example, simply changing the name of a document took nearly 20 minutes in an initial attempt. Adding a folder to a document library is an example of the difficult and sometimes illogical nature of the OOTB experience. Rather than beginning the process with something intuitive like a “New Folder” or “Add Folder” link, the user first has to activate a “New Document” link. AFBC usability testers spent nearly 40 minutes trying to determine how to create a document library, including time spent with SharePoint’s online help instructions, some of which were helpful and some of which were not. The instructions that did help get the job done said to go to “Settings” and then “Add an App,” which obviously lacks a logical or intuitive path.

Discover 508 for SharePoint avoids all that difficulty and confusion. The experience with the Discover 508 solution was much more intuitive and streamlined, giving a person with vision loss the ability to complete each task as effectively and efficiently as his or her sighted peers. This is extremely important in today’s competitive job market, giving people with vision loss the ability to compete on an even playing field with their sighted peers.

ACM: Introduction to Data Access Control

July 16th, 2014 by David Milller

Background

Armedia Case Management (ACM) is a framework for developing case management applications.

Data Access Control ensures each user sees only records they are authorized to see, and are prevented from seeing unauthorized records.  Data access control is applied to individual business objects, as opposed to role-based access control, which is only based on the user identity.

Role-based access is usually applied to each URL in a REST-oriented application.  It ensures the user is authorized for that URL; for example, that only document approvers can invoke the “/approveDocument” URL.  But role-based access by itself means any document approver can approve any document.

Spring Security easily integrates with Spring MVC to enable URL-based access control.  How can we easily add the logic to ensure that only Document A’s approver can approve Document A, and only Document B’s approver can approve Document B?  Not to mention ensuring that, until the document is approved, only users involved in the draft and approval process can even see it – so that it does not appear in anyone else’s search results or queries?

Straightforward Custom Applications

If the application is written for a single customer and implements a straightforward process with the same rules for everyone, the easy path is to build these controls into the application code.  Embed the rules in the query logic such that the database only returns appropriate rows; add checks in the document approval logic to ensure the user is on the approver list for that document.

If you design and build this application very carefully, and you understand the customer very well, and their requirements do not change faster than you can update the application; then this approach can work.  I’ve written many such applications and they were very successful; the users were happy, and all was well.

Larger, More Diverse Customers

The larger the customer organization, and the more departments and geographic regions being served, the harder it gets to implement data access logic in code.  I tried this approach for a large government agency where each region had slightly different rules.  The implementation got pretty difficult.  The queries become long, and the results came back much slower; the database isn’t really meant to apply sophisticated access control rules over very large data sets.  Let’s just say the customer was less happy.

Many Different Customers with Different Problem Domains

Let’s just extend the previous scenario to the ACM arena, where the framework has to satisfy entirely different customers, each of whom has radically different rules and even different types of business objects.  Now the straightforward approach of implementing access logic in code amounts to a commitment to rewrite the entire application for each customer.  Now my Armedia leadership team (the people who sign my paychecks!) are less happy!

The ACM Solution

In ACM, we have a flexible solution.  We have a fast way to implement the most common types of rules and we also provide a mechanism to implement more complicated rules.  And we have a fast, reliable way to evaluate the data access controls at runtime.

My next few blog posts will explore this solution in more detail.

In a nutshell, ACM requires the results of all data access control rules to be embodied in a database table.  Not the rules themselves; but the results of each rule as applied to each domain/business object.  This table holds all the access rules for each individual domain/business object.  Each domain/business object’s rules are also indexed in the Apache SOLR search engine.  This allows ACM to encode the current user’s access rights (group membership and any other access tokens) as a set of boolean restrictions in the SOLR search query.  SOLR is designed to efficiently evaluate multiple boolean search conditions.    This gives us fast search results including only domain/business objects the user is allowed to see.

More to come – stay tuned!

Click Here to See all Blogs about Armedia Case Management

 

Security and User Access Control in Ephesoft 3.1

June 30th, 2014 by Chae Kim

 

Overview of Ephesoft Security

With an introduction of Single Sign-On (SSO) and other great new features, such as new Extraction (Fuzzy Key Field, Zone Extraction, Regular Expression Builder) and Classification (Test Classification Tool, Advanced DA Switch, Regex Classification Switch) Features, Application Level Scripting, Email Batch Processing, and RecoStar TIFF Conversion, for the new Ephesoft version 3.1, Ephesoft is now more enterprise-ready than ever before. While working with the Ephesoft SSO and user access control implementations for my client, I had a chance to explore Ephesoft’s adherence to CIA principles – Confidentiality, Integrity, and Availability of information.

Also known as the CIA triad, CIA is a fundamental concept in information security and often used as a guideline for information security policies within an organization. Confidentiality refers to limiting information access and disclosure to authorized users and preventing access by unauthorized users, therefore protecting information from any unauthorized access. Integrity refers to the accuracy and trustworthiness of information and protecting information from unauthorized modification. Finally, Availability refers to a guarantee of access to the information by authorized users.

In this blog, I wanted to concentrate on the “Confidentiality” of Ephesoft. The following are Ephesoft features that ensure confidentiality of the Ephesoft document capture system:

  • SSO through HTTP(S) header request – User-based authentication, which ensures data confidentiality, is controlled at the organization level.
  • Integration with secure MS Active Directory or LDAP – In addition to the user authentication, user authorization can be provided based on roles configured with secure MS Active Directory or LDAP server.
  • Role-based user access control – User access control ensures that only users with valid roles can access different areas of the application and the information of Batch Classes and batch instances that are only intended for right people. Following are examples of role based  user access control:
  1. Security Constraint
  2. Batch Class Access
  3. Batch Instance Access

Ephesoft 3.1 Product Documentation available on the Ephesoft website provides detailed information on the SSO and User Management using MS-AD, LDAP, and Tomcat. Please refer to This Link for more information.

Examples of User Access Control

In addition to the Product Documentation, we can further explore the examples of the role-based user access control here.

Security Constraint

Role-based application access control let you limit access to the Ephesoft User Interfaces and following table shows UI represented as web resource and suggested role type for each resource.

Ephesoft_User_Role_Types

 

If needed, the role types can be more specialized, such as Scan Operator, Review/Validate Operator, etc. The following table shows an example of specialized role and web resources accessed by each role type.

Ephesoft_Specialized_Role_Types

 

Below is an example of the “batch list” security constraint configured in <Ephesoft Installation Path>\Application\WEB-INF\web.xml.

	<security-constraint>
		<web-resource-collection>
			<web-resource-name>batch list</web-resource-name>
			<url-pattern>/ReviewValidate.html</url-pattern>
			<http-method>GET</http-method>
			<http-method>POST</http-method>
		</web-resource-collection>
		<auth-constraint>
			<role-name>ReviewValidate Operator</role-name>
<role-name>Scan Operator</role-name>
		</auth-constraint>
	</security-constraint>

*Please note that for Ephesoft 3.1, if SSO is configured to be in use, the security constraints need to be commented out in web.xml because the security constraint in conjunction with SSO is not fully developed yet. Ephesoft expects to provide the security constraints fully compatible with SSO in the next major patch release.

 

Batch Class Access

Ephesoft made it very simple to apply role based access to Batch Classes and batch instances. You can simply navigate to the Batch Class Configuration section and pick the role you want as shown below.

Ephesoft_Role_Window

 

Each Batch Class can be configured with available user role(s), so only the users that belong to such role(s) can access the Batch Class and batches created based on the Batch Class. This Batch Class user access control can be very useful in providing variable scan processing depending on unique group or departmental usages within a large organization. Ephesoft can be shared by multiple departments within an organization, but each department sees the Batch Class and batch instances that are only relevant to the department.

Batch Instance Access

It is a common practice for multiple groups or departments within an organization to share single Ephesoft system and if processes are different, each department can have a separate Batch Class to handle different scanning needs as explain in the Batch Class Access section. However, maintaining multiple Batch Classes for the same process can be difficult to maintain. If single Batch Class needs to be shared by multiple departments, utilizing the batch instance group feature can provide a customized view of the batch list with dynamic assignment of a user role to each batch instance.

The Batch Instance Group feature, using one the Ephesoft application database tables, batch_instance_group, allows you to assign group name to each batch instance through simple custom scripting. The method below is an example, which was developed based on an example script, ScriptDocumentAssembler_BatchInstanceGrouFeature.java, which is available to download from the Ephesoft Script Guide

//get the batch ID from batch.xml	
	Element root = documentFile.getRootElement();
	Element batchInstanceID = (Element)root.getChild(BATCH_INSTANCE_ID);
	if (batchInstanceID == null) {
		return;
	}
	batchID = batchInstanceID.getValue();
		
	if (batchID != null && batchGroup != null) {
		//Retrieve DB info from dcma-db.properties file
		Properties prop = new Properties();
		String pathHome = System.getenv("DCMA_HOME");
		String pathProp = "WEB-INF/classes/META-INF/dcma-data-access/dcma-db.properties";
		File propFile = new File (pathHome, pathProp);
		InputStream input = null;
		try {
			input = new FileInputStream(propFile);
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		}
		try {
			prop.load(input);
		} catch (IOException e) {
			e.printStackTrace();
	} finally {
	if (input != null) {
	try { input.close(); } catch (IOException e) { }
	}
		}
		
		//get URL, username and password to make DB connection
		String username = (String) prop.get("dataSource.username");
		String password = (String) prop.get("dataSource.password");
		String driverClassName = (String) prop.get("dataSource.driverClassName");
		String databaseName = (String) prop.get("dataSource.databaseName");
		String serverName = (String) prop.get("dataSource.serverName");
		String url = (String) prop.get("dataSource.url");
	
		url = url.replace("${dataSource.serverName}", serverName);
		url = url.replace("${dataSource.databaseName}", databaseName);
			
		//Execute SQL update to assign group name to the batch_instance_groups table
		Connection conn = null;
		Statement stmt = null;
		try {
			Class.forName(driverClassName).newInstance();		
			conn = DriverManager.getConnection(url, username, password);
		
			String sqlInsert = "insert into batch_instance_groups(creation_date, last_modified, batch_instance_id, group_name) VALUES (Now(), Now(), '" + batchID + "', '" + batchGroup + "')";			
			stmt = conn.createStatement();
			stmt.executeUpdate(sqlInsert);
				
		} catch (Exception e) {
			e.printStackTrace();
	} finally {
	if (stmt != null) {
	try { stmt.close(); } catch (SQLException e) { }
	}
	if (conn != null) {
	try { conn.close(); } catch (SQLException e) { }
	}
		}
	} else {
		System.err.println("Cannot assign Batch Instance Group - missing Batch ID and/or User Group.");
	}
}

 

Depending on the batch group assignment logic of your choice, such as scan operator, info on batch cover sheet, document type, metadata extracted from documents, etc., you can dynamically assign different role groups to each batch and provide a specialized access to the batch instances.

As you can see from the examples of the user access control mentioned in this blog, Ephesoft is capable of providing multi-layers of user access control that are easy to apply and configure. Ephesoft not only makes it easy to process your documents, but also put you in full control of user access on your valuable information.

Adding Full Text Search to ACM via Spring and JPA

June 24th, 2014 by David Milller

What, No Full Text Search Already?

My project Armedia Case Management (ACM) is a Spring application that integrates with Alfresco (and other ECM platforms) via CMIS – the Content Management Interoperability Standard.  ACM stores metadata in a database, and content files in the ECM platform.  Our customers so far have not needed integrated full text search; plain old database queries have sufficed. Eventually we know full text search has to be addressed.  Why not now, since ACM has been getting some love?  Plus, high quality search engines such as SOLR are free, documented in excellent books, and could provide more analytic services than just plain old search.

Goals

What do we want from SOLR Search integration?

  1. We want both quick search and advanced search capabilities.  Quick search should be fast and search only metadata (case number, task assignee, …).  Quick search is to let users find an object quickly based on the object ID or the assignee.  Advanced search should still be fast, but includes content file search and more fields.  Advanced search is to let users explore all the business objects in the application.
  2. Search results should be integrated with data access control.  Only results the user is authorized to see should appear in the search results.  This means two users with different access rights could see different results, even when searching for the same terms.
  3. The object types to be indexed, and the specific fields to be indexed for each object type, should be configurable at run time.  Each ACM installation may trace different object types, and different customers may want to index different data.  So at runtime the administrator should be able to enable and disable different object types, and control which fields are indexed.
  4. Results from ACM metadata and results from the content files (stored in the ECM platform) should be combined in a seamless fashion.  We don’t want to extend the ECM full-text search engine to index the ACM metadata, and we don’t want the ACM metadata full text index to duplicate the ECM engine’s data (we don’t want to re-index all the content files already indexed by the ECM).  So we will have two indexes: the ACM metadata index, and the ECM content file index.  But the user should never be conscious of this; the ACM search user interface and search results should maintain the illusion of a single coherent full text search index.

Both Quick Search and Advanced Search

To enable both quick search and advanced search modes, I created two separate SOLR collections.  The quick search collection includes only the metadata fields to be searched via the Quick Search user interface.  The full collection includes all indexed metadata.  Clearly these two indexes are somewhat redundant since the full collection almost certainly includes everything indexed in the quick search collection.  As soon as we have a performance test environment I’ll try to measure whether maintaining the smaller quick search collection really makes sense.  If the quick search collection is not materially faster than the equivalent search on the full index, then we can stop maintaining the quick search collection.

Integration with Data Access Control

Data access control is a touchy issue since the full text search queries must still be fast, the pagination must continue to work, and the hit counts must still be accurate.  These goals are difficult to reach if application code applies data access control to the search results after they leave the search engine.  So I plan to encode the access control lists into the search engine itself, so the access control becomes just another part of the search query.  Search Technologies has a fine series of articles about this “early binding” architecture: http://www.searchtechnologies.com/search-engine-security.html.

Configurable at Runtime

ACM has a basic pattern for runtime-configurable options.  We encode the options into a Spring XML configuration file, which we load at runtime by monitoring a Spring load folder.  This allows us to support as many search configurations as we need: one Spring full-text-search config file for each business object type.  At some future time we will add an administrator control panel with a user interface for reading and writing such configuration files.  This Spring XML profile configures the business object to be indexed.  For business objects stored in ACM tables, this configuration includes the JPA entity name, the entity properties to be indexed, the corresponding SOLR field names, and how often the database is polled for new records.  For Activiti workflow objects, the configuration includes the Activiti object type (tasks or business processes), and the properties to be indexed.

Seamless Integration of Database, Activiti, and ECM Data Sources

The user should not realize the indexed data is from multiple repositories.

Integrating database and Activiti data sources is easy: we just feed data from both sources into the same SOLR collection.

The ECM already indexes its content files.  We don’t want to duplicate the ECM index, and we especially don’t want to dig beneath the vendor’s documented search interfaces.

So in our application code, we need to make two queries: one to the ACM SOLR index (which indexes the database and the Activiti data), and another query to the ECM index.  Then we need to merge the two result sets.  As we encounter challenges with this double query and result set merging I may write more blog articles!

Closing Thoughts

SOLR is very easy to work with.  I may use it for more than straight forward full text search.  For example, the navigation panels with the lists of cases, lists of tasks, lists of complaints, and so on include only data in the SOLR quick search collection.  So in theory we should be able to query SOLR to populate those lists – versus calling JPA queries.  Again, once we have a performance test environment I can tell whether SOLR queries or JPA queries are faster in general.

Stay up-to-date on all my blogs about my Armedia Case Management projects. 

Initial Thoughts on Amazon S3 and DynamoDB

June 19th, 2014 by Judy Hsu

I’ve been tinkering with Amazon S3 and DynamoDB to get exposed to NoSQL databases.  I haven’t had the need to get down to the nitty gritty so am not managing REST (or SOAP) calls myself, just been using the AWS SDK for Java.  I am writing this post to gather initial thoughts that I have so far.

I wanted to learn more about AWS because as Amazon says, they want to “enable developers to focus on innovating with data, rather than how to store it”.  You don’t have the pains of being required to design the infrastructure needs of the system as a whole now in the beginning, or the future when you need performance and reliability.  It has 99.999999999% durability, with 99.99% availability by replicating across several facilities in a region.  It scales automatically, you don’t do anything, and it remains available as it’s doing that under the covers.  Amazon has a great ‘Free Usage Tier’ for folks like me (and you) who are just starting out and want to get hands-on experience.  Not all of the services are offered in the free usage tier (S3 and DynamoDB are), so take a look!

Summaries

Security credentials

There is a AWS root account credential.  One of the first things you should do is create a user for yourself, and assign it to the admin group.  Never use your AWS root account credential directly! You can then create IAM (Identity and Access Management) users depending on the application need, creating groups with logical functions, and users to those groups.

Amazon S3

  • Literally don’t have to configure anything to get started, you have a key, and you upload your value, just keep storing things in S3.  You would still need some way to keep track of what keys you are using for you specific application
  • Doesn’t support object locking, you’ll need to build this manually if it’s needed
  • S3 max data storage is 5TB, no limit on attributes for an item
  • Uses the eventual consistency consistency model

DynamoDB

  • Minimum to configure when creating your tables is to specify the table primary key, and provisioning the read and write throughput needed
  • Supports optimistic object locking
  • Max data size is 64KB, no limit on number of attributes for an item
  • Supports eventual consistency or strongly consistent consistency  model

When to use S3 versus DynamoDB?

S3 is for larger size data that you are rarely going to touch (like storing data for analysis or backup/archiving of data).  Whereas DynamoDB is more for data that is more dynamic.  As we already talked about with both, you can start with a small amount of data, and scale up and down on the requirements.  For DynamoDB, you would also need to adjust your  read and write capacities.  In your custom application, you can also use a mix both, or use other Amazon services simultaneously.

The storage for S3 and DynamoDB are relatively cheap, and they only get cheaper.  So if you are developing an app for you local, dev, test, or pre-prod environments, you just may want to go ahead and use different instances of the services.  Or you can use a sandbox tool like ZCloud AWS sandbox for S3.

Now what, want to start playing around?
Get the AWS SDK for Java:

Some sample code:

Next in the line on NoSQL database are Apache Cassandra and MongoDB, I’ll have more thoughts on those in the coming months.

For more examples, ideas, and inspiration, feel free to read through the links provided in the “Now what, want to start playing around” section.  What questions do you have about this post? Let me know in the comments section below, and I will answer each one.

***********

For details on the code that I played around with, I grabbed AWS SDK for Java with Maven, and did everything in JUnit integration tests.  Starting point for most of this code was from the sample code links from above.

Note: You’ll see in all tests classes, the last thing done is to delete all the objects that were created.  We’re not using much data here, and it’s The Free Tier, but doesn’t hurt to be safe!

Credentials

Placed in the classpath under src/main/resource/AwsCredentials.properties (Use one of your IAM accounts here, not your root credentials!)

pom.xml

<!-- For Amazon Java SDK -->
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>aws-java-sdk</artifactId>
    <version>${awsjavasdk.version}</version>
</dependency>

 

S3 – BaseTestCase.java

package com.cherryshoe.services.sdk.s3;

import org.junit.After;
import org.junit.Before;
import org.junit.Rule;
import org.junit.rules.TestName;

import com.amazonaws.auth.ClasspathPropertiesFileCredentialsProvider;
import com.amazonaws.regions.Region;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;
import com.amazonaws.services.s3.model.Bucket;
import com.amazonaws.services.s3.model.ListObjectsRequest;
import com.amazonaws.services.s3.model.ObjectListing;
import com.amazonaws.services.s3.model.S3ObjectSummary;

public class BaseTestCase {

    private AmazonS3 s3Service;
    private String bucketPrepend = "cherryshoe";

    @Rule public TestName name = new TestName();

    public BaseTestCase()
    {
        super();
        this.s3Service = new AmazonS3Client(new ClasspathPropertiesFileCredentialsProvider());
        Region region = Region.getRegion(Regions.US_WEST_2);
        s3Service.setRegion(region);
    }

    public AmazonS3 getS3Service() {
        return s3Service;
    }    

    public String getBucketPrepend() {
        return bucketPrepend;
    }

    @Before
    public void printBeforeTestRun() throws Exception
    {
        System.out.println("-------------------------------------------------------------------------------------");
        System.out.println("Starting Test: " + name.getMethodName());
        System.out.println("-------------------------------------------------------------------------------------");
    }

    @After
    public void printAfterTestRun() throws Exception
    {
        System.out.println("-------------------------------------------------------------------------------------");
        System.out.println("Finished Test: " + name.getMethodName());
        System.out.println("-------------------------------------------------------------------------------------");
        System.out.println();
    }

    protected void deleteAllObjectsInAllBuckets() throws Exception {

        System.out.println("Starting deleting all objects in all buckets");
        for (Bucket bucket : getS3Service().listBuckets()) {
            String bucketName = bucket.getName();            

            ListObjectsRequest getObjectsRequest = new ListObjectsRequest();
            getObjectsRequest.setBucketName(bucketName);
            ObjectListing objectListing = getS3Service().listObjects(
                    getObjectsRequest);

            System.out.println("Deleting objects from bucket[" + bucketName + "]");
            for (S3ObjectSummary objectSummary : objectListing
                    .getObjectSummaries()) {
                System.out.println("Deleting object with key["
                        + objectSummary.getKey() + "]");
                getS3Service().deleteObject(bucketName, objectSummary.getKey());
            }

            deleteBucket(bucketName);
        }

    }

    protected void createBucket(String bucketName) throws Exception {
        // create bucket
        System.out.println("Creating bucket[" + bucketName +"]");
        getS3Service().createBucket(bucketName);
    }

    protected boolean isBucketExists(String bucketName) throws Exception {
        System.out.println("Listing buckets...");
        boolean foundBucket = false;
        for (Bucket bucket : getS3Service().listBuckets()) {
            String name = bucket.getName();
            System.out.println(name);
            if (name.equals(bucketName))
                foundBucket = true;
        }    
        return foundBucket;
    }

    private void deleteBucket(String bucketName) throws Exception {
        System.out.println("Deleting bucket[" + bucketName + "]");
        getS3Service().deleteBucket(bucketName);
    }

}

 

 CreateObjectTest.java

package com.cherryshoe.services.sdk.s3;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.junit.After;
import org.junit.Assert;
import org.junit.Test;

import com.amazonaws.AmazonServiceException;
import com.amazonaws.services.s3.model.CopyObjectRequest;
import com.amazonaws.services.s3.model.GetObjectRequest;
import com.amazonaws.services.s3.model.ObjectMetadata;
import com.amazonaws.services.s3.model.PutObjectRequest;
import com.amazonaws.services.s3.model.PutObjectResult;
import com.amazonaws.services.s3.model.S3Object;
import com.amazonaws.services.s3.model.S3ObjectInputStream;
import com.cherryshoe.services.web.Utils;

public class CreateObjectTest extends BaseTestCase {

    @After
    public void cleanUp() throws Exception {
        // delete
        deleteAllObjectsInAllBuckets();
    }

    @Test
    public void createBucketAndObject() throws Exception {
        String bucketName = getBucketPrepend() + Utils.get10DigitUniqueId();
        createBucket(bucketName);
        Assert.assertTrue(isBucketExists(bucketName));

        /*
         * Upload an object to your bucket - You can easily upload a file to S3,
         * or upload directly an InputStream if you know the length of the data
         * in the stream. You can also specify your own metadata when uploading
         * to S3, which allows you set a variety of options like content-type
         * and content-encoding, plus additional metadata specific to your
         * applications.
         */
        String key = Utils.get10DigitUniqueId();
        System.out.println("key of object to create[" + key + "]");
        String pathToSourceFile = "./src/test/resources/1pager.tif";
        File fileData = new File(pathToSourceFile);
        Assert.assertTrue(fileData.exists());

        PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName,
                key, fileData);
        PutObjectResult putObjectResult = getS3Service().putObject(
                putObjectRequest);
    }

    /*
     * S3 you have to copy the object to replace metadata on it....
     * http://docs.aws.amazon.com/AmazonS3/latest/dev/CopyingObjectsExamples.html
     */
    @Test
    public void copyObject() throws Exception {

        try {
            String bucketName = getBucketPrepend() + Utils.get10DigitUniqueId();
            createBucket(bucketName);
            Assert.assertTrue(isBucketExists(bucketName));

            /*
             * Upload an object to your bucket - You can easily upload a file to
             * S3, or upload directly an InputStream if you know the length of
             * the data in the stream. You can also specify your own metadata
             * when uploading to S3, which allows you set a variety of options
             * like content-type and content-encoding, plus additional metadata
             * specific to your applications.
             */
            String key = Utils.get10DigitUniqueId();
            System.out.println("key of object to create[" + key + "]");
            String pathToSourceFile = "./src/test/resources/1pager.tif";
            File fileData = new File(pathToSourceFile);
            Assert.assertTrue(fileData.exists());

            ObjectMetadata objectMetadata = new ObjectMetadata();
            String metadataKey1 = "AppSpecific1";
            String metadataVal1 = "AppSpecificMetadata1";
            objectMetadata.addUserMetadata(metadataKey1, metadataVal1);

            PutObjectRequest putObjectRequest = new PutObjectRequest(
                    bucketName, key, fileData).withMetadata(objectMetadata);
            PutObjectResult putObjectResult = getS3Service().putObject(
                    putObjectRequest);

            /*
             * Download an object - When you download an object, you get all of
             * the object's metadata and a stream from which to read the
             * contents. It's important to read the contents of the stream as
             * quickly as possibly since the data is streamed directly from
             * Amazon S3 and your network connection will remain open until you
             * read all the data or close the input stream.
             * 
             * GetObjectRequest also supports several other options, including
             * conditional downloading of objects based on modification times,
             * ETags, and selectively downloading a range of an object.
             */
            System.out.println("Downloading object[" + key + "]");

            GetObjectRequest getObjectRequest = new GetObjectRequest(
                    bucketName, key);
            Assert.assertNotNull(getObjectRequest);
            S3Object s3Object = getS3Service().getObject(getObjectRequest);
            Assert.assertNotNull(s3Object);

            System.out.println("Getting metadata[" + key + "]");
            System.out.println("Content-Type: "
                    + s3Object.getObjectMetadata().getContentType());
            System.out.println("VersionId: "
                    + s3Object.getObjectMetadata().getVersionId());

            // ha! the metadata keys get lowercased on S3 side
            String metadataVal1Ret = s3Object.getObjectMetadata()
                    .getUserMetadata().get(metadataKey1.toLowerCase());
            System.out.println(metadataKey1.toLowerCase() + ": "
                    + metadataVal1Ret);
            Assert.assertEquals(metadataVal1, metadataVal1Ret);

            //////////////////////////
            // Copying object, update the metadata
            //////////////////////////
            objectMetadata = new ObjectMetadata();
            String metadataKey2 = "AppSpecific2";
            String metadataVal2 = "AppSpecificMetadata2";
            objectMetadata.addUserMetadata(metadataKey2, metadataVal2);

            String newKey = Utils.get9DigitUniqueId();
            System.out.println("key of object to copy[" + newKey + "]");
            CopyObjectRequest copyObjRequest = new CopyObjectRequest(
                    bucketName, key, bucketName, newKey).withNewObjectMetadata(objectMetadata);
            System.out.println("Copying object.");
            getS3Service().copyObject(copyObjRequest);

            // get object
            getObjectRequest = new GetObjectRequest(
                    bucketName, newKey);
            Assert.assertNotNull(getObjectRequest);
            s3Object = getS3Service().getObject(getObjectRequest);
            Assert.assertNotNull(s3Object);

            // compare the copy contents is the same as what was uploaded
            System.out.println("Getting content[" + newKey + "]");
            S3ObjectInputStream s3ObjectInputStream = s3Object
                    .getObjectContent();
            boolean deleteNewFile = false;
            boolean filesEqual = validateObject(s3ObjectInputStream,
                    deleteNewFile, pathToSourceFile);
            Assert.assertTrue(filesEqual);

            System.out.println("Getting metadata[" + newKey + "]");
            System.out.println("Content-Type: "
                    + s3Object.getObjectMetadata().getContentType());
            System.out.println("VersionId: "
                    + s3Object.getObjectMetadata().getVersionId());

            // ha! the metadata keys get lowercased on S3 side
            String metadataVal2Ret = s3Object.getObjectMetadata()
                    .getUserMetadata().get(metadataKey2.toLowerCase());
            System.out.println(metadataKey2.toLowerCase() + ": "
                    + metadataVal2Ret);
            Assert.assertEquals(metadataVal2, metadataVal2Ret);

        } catch (AmazonServiceException ase) {
            ase.printStackTrace();
            System.out
                    .println("Caught an AmazonServiceException, which means your request made it "
                            + "to Amazon S3, but was rejected with an error response for some reason.");
            System.out.println("Error Message:    " + ase.getMessage());
            System.out.println("HTTP Status Code: " + ase.getStatusCode());
            System.out.println("AWS Error Code:   " + ase.getErrorCode());
            System.out.println("Error Type:       " + ase.getErrorType());
            System.out.println("Request ID:       " + ase.getRequestId());
        }

    }

    @Test
    public void createObjectTest() throws Exception {
        try {
            String bucketName = getBucketPrepend() + Utils.get10DigitUniqueId();
            createBucket(bucketName);
            Assert.assertTrue(isBucketExists(bucketName));

            /*
             * Upload an object to your bucket - You can easily upload a file to
             * S3, or upload directly an InputStream if you know the length of
             * the data in the stream. You can also specify your own metadata
             * when uploading to S3, which allows you set a variety of options
             * like content-type and content-encoding, plus additional metadata
             * specific to your applications.
             */
            String key = Utils.get10DigitUniqueId();
            System.out.println("key of object to create[" + key + "]");
            String pathToSourceFile = "./src/test/resources/1pager.tif";
            File fileData = new File(pathToSourceFile);
            Assert.assertTrue(fileData.exists());

            ObjectMetadata objectMetadata = new ObjectMetadata();
            String metadataKey1 = "AppSpecific1";
            String metadataVal1 = "AppSpecificMetadata1";
            objectMetadata.addUserMetadata(metadataKey1, metadataVal1);

            PutObjectRequest putObjectRequest = new PutObjectRequest(
                    bucketName, key, fileData).withMetadata(objectMetadata);
            PutObjectResult putObjectResult = getS3Service().putObject(
                    putObjectRequest);

            /*
             * Download an object - When you download an object, you get all of
             * the object's metadata and a stream from which to read the
             * contents. It's important to read the contents of the stream as
             * quickly as possibly since the data is streamed directly from
             * Amazon S3 and your network connection will remain open until you
             * read all the data or close the input stream.
             * 
             * GetObjectRequest also supports several other options, including
             * conditional downloading of objects based on modification times,
             * ETags, and selectively downloading a range of an object.
             */
            System.out.println("Downloading object[" + key + "]");

            GetObjectRequest getObjectRequest = new GetObjectRequest(
                    bucketName, key);
            Assert.assertNotNull(getObjectRequest);
            S3Object s3Object = getS3Service().getObject(getObjectRequest);
            Assert.assertNotNull(s3Object);

            // compare the contents is the same as what was uploaded
            System.out.println("Getting content[" + key + "]");
            S3ObjectInputStream s3ObjectInputStream = s3Object
                    .getObjectContent();
            boolean deleteNewFile = false;
            boolean filesEqual = validateObject(s3ObjectInputStream,
                    deleteNewFile, pathToSourceFile);
            Assert.assertTrue(filesEqual);

            System.out.println("Getting metadata[" + key + "]");
            System.out.println("Content-Type: "
                    + s3Object.getObjectMetadata().getContentType());
            System.out.println("VersionId: "
                    + s3Object.getObjectMetadata().getVersionId());

            // ha! the metadata keys get lowercased on S3 side
            String metadataVal1Ret = s3Object.getObjectMetadata()
                    .getUserMetadata().get(metadataKey1.toLowerCase());
            System.out.println(metadataKey1.toLowerCase() + ": "
                    + metadataVal1Ret);
            Assert.assertEquals(metadataVal1, metadataVal1Ret);

        } catch (AmazonServiceException ase) {
            ase.printStackTrace();
            System.out
                    .println("Caught an AmazonServiceException, which means your request made it "
                            + "to Amazon S3, but was rejected with an error response for some reason.");
            System.out.println("Error Message:    " + ase.getMessage());
            System.out.println("HTTP Status Code: " + ase.getStatusCode());
            System.out.println("AWS Error Code:   " + ase.getErrorCode());
            System.out.println("Error Type:       " + ase.getErrorType());
            System.out.println("Request ID:       " + ase.getRequestId());
        }
    }

    protected boolean validateObject(InputStream inputStream,
            boolean deleteNewFile, String pathToSourceFile) throws Exception {

        String pathRetrievedFile = writeFile(inputStream, pathToSourceFile);

        // if we got here, file was written successfully
        // so check if file sizes are different. If they are, then document was
        // not retrieved successfully.
        File sourceFile = new File(pathToSourceFile);
        File retrievedFile = new File(pathRetrievedFile);

        boolean filesEqual = false;
        try {
            filesEqual = sourceFile.length() == retrievedFile.length() ? true
                    : false;
        } finally {
            if (deleteNewFile) {
                // delete retrieved file
                retrievedFile.delete();
            }
        }

        return filesEqual;
    }

    protected String writeFile(InputStream inputStream, String pathToSourceFile)
            throws Exception {
        String pathToBinaryFile = "./src/test/resources/s3ObjectFile"
                + Utils.getRandomId() + ".tif";

        try {
            // Read the binary data and write to file

            File file = new File(pathToBinaryFile);

            OutputStream output = new FileOutputStream(file);

            byte[] buffer = new byte[8 * 1024];

            int bytesRead;

            try {
                while ((bytesRead = inputStream.read(buffer)) != -1) {
                    output.write(buffer, 0, bytesRead);
                }
            } finally {

                // Closing the input stream will trigger connection release
                inputStream.close();

                // close file
                output.close();
            }
        } catch (IOException ex) {
            // In case of an IOException the connection will be released
            // back to the connection manager automatically
            throw ex;
        } catch (RuntimeException ex) {
            throw ex;
        }

        return pathToBinaryFile;
    }

}

 

DynamoDB

P.S. – I haven’t had time to look at DynamoDapper yet, 

BaseTestCase.java

package com.cherryshoe.services.sdk.dynamodb;

import org.junit.After;
import org.junit.Before;
import org.junit.Rule;
import org.junit.rules.TestName;

import com.amazonaws.auth.ClasspathPropertiesFileCredentialsProvider;
import com.amazonaws.regions.Region;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient;

public class BaseTestCase {

    private AmazonDynamoDBClient client;

    @Rule public TestName name = new TestName();

    public BaseTestCase()
    {
        super();
        this.client = new AmazonDynamoDBClient(new ClasspathPropertiesFileCredentialsProvider());
        Region region = Region.getRegion(Regions.US_WEST_2);
        client.setRegion(region);
    }

    public AmazonDynamoDBClient getClient() {
        return client;
    }    

    @Before
    public void printBeforeTestRun() throws Exception
    {
        System.out.println("-------------------------------------------------------------------------------------");
        System.out.println("Starting Test: " + name.getMethodName());
        System.out.println("-------------------------------------------------------------------------------------");
    }

    @After
    public void printAfterTestRun() throws Exception
    {
        System.out.println("-------------------------------------------------------------------------------------");
        System.out.println("Finished Test: " + name.getMethodName());
        System.out.println("-------------------------------------------------------------------------------------");
        System.out.println();
    }

    protected void deleteAllObjectsInAllBuckets() throws Exception {

    }

}

 

DynamoDBTest.Java

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.junit.Before;
import org.junit.FixMethodOrder;
import org.junit.Test;
import org.junit.runners.MethodSorters;

import com.amazonaws.AmazonServiceException;
import com.amazonaws.services.dynamodbv2.model.DeleteTableRequest;
import com.amazonaws.services.dynamodbv2.model.DeleteTableResult;
import com.amazonaws.services.dynamodbv2.model.AttributeDefinition;
import com.amazonaws.services.dynamodbv2.model.AttributeValue;
import com.amazonaws.services.dynamodbv2.model.CreateTableRequest;
import com.amazonaws.services.dynamodbv2.model.CreateTableResult;
import com.amazonaws.services.dynamodbv2.model.DeleteItemRequest;
import com.amazonaws.services.dynamodbv2.model.DeleteItemResult;
import com.amazonaws.services.dynamodbv2.model.DescribeTableRequest;
import com.amazonaws.services.dynamodbv2.model.ExpectedAttributeValue;
import com.amazonaws.services.dynamodbv2.model.GetItemRequest;
import com.amazonaws.services.dynamodbv2.model.GetItemResult;
import com.amazonaws.services.dynamodbv2.model.KeySchemaElement;
import com.amazonaws.services.dynamodbv2.model.KeyType;
import com.amazonaws.services.dynamodbv2.model.ProvisionedThroughput;
import com.amazonaws.services.dynamodbv2.model.PutItemRequest;
import com.amazonaws.services.dynamodbv2.model.ResourceNotFoundException;
import com.amazonaws.services.dynamodbv2.model.ReturnValue;
import com.amazonaws.services.dynamodbv2.model.TableDescription;
import com.amazonaws.services.dynamodbv2.model.TableStatus;

// this makes sure the tests run in order, yeah I know I shouldn't count on the order that
// the tests are run in for good tests...
@FixMethodOrder(MethodSorters.NAME_ASCENDING) 
public class DynamoDbTest extends BaseTestCase {

    private String tableName = "Judy";
    private List<String> idList;

    @Before
    public void setUp() {
        idList = new ArrayList<String>();
    }

    /*
     * Haven't gotten to using the DynamoDb annotations yet...
     */
    @Test
    public void test1_createTableTest() {

        // create the table
        ArrayList<AttributeDefinition> attributeDefinitions = new ArrayList<AttributeDefinition>();
        attributeDefinitions.add(new AttributeDefinition().withAttributeName(
                "Id").withAttributeType("N"));
        // Note:  You don't have to define all the columns at create time, just the key.  It's like an aspect where you can add the columns later

        ArrayList<KeySchemaElement> ks = new ArrayList<KeySchemaElement>();
        ks.add(new KeySchemaElement().withAttributeName("Id").withKeyType(
                KeyType.HASH));

        ProvisionedThroughput provisionedThroughput = new ProvisionedThroughput()
                .withReadCapacityUnits(10L).withWriteCapacityUnits(5L);

        CreateTableRequest request = new CreateTableRequest()
                .withTableName(tableName)
                .withAttributeDefinitions(attributeDefinitions)
                .withKeySchema(ks)
                .withProvisionedThroughput(provisionedThroughput);

        CreateTableResult result = getClient().createTable(request);

        waitForTableToBecomeAvailable(tableName);

    }

    @Test
    public void test2_createRecordsTest() {
        uploadSampleProducts(tableName);
    }

    @Test
    public void test3_retrieveItem() {
        String id = "101";

        retrieveItem(id);
    }

    @Test
    public void test4_deleteTest() {
        // if the list is empty populate it
        if (getIdList().isEmpty()) {
            getIdList().add("101");
            getIdList().add("102");
            getIdList().add("103");
            getIdList().add("201");
            getIdList().add("202");
            getIdList().add("203");
            getIdList().add("204");
            getIdList().add("205");
        }

        // delete items
        deleteItems(tableName);

        // delete table
        DeleteTableRequest deleteTableRequest = new DeleteTableRequest()
        .withTableName(tableName);
        DeleteTableResult result = getClient().deleteTable(deleteTableRequest);
        waitForTableToBeDeleted(tableName);  
    }

    private void retrieveItem(String id) {
        try {

            HashMap<String, AttributeValue> key = new HashMap<String, AttributeValue>();
            key.put("Id", new AttributeValue().withN(id));
            GetItemRequest getItemRequest = new GetItemRequest()
                .withTableName(tableName)
                .withKey(key)
                .withAttributesToGet(Arrays.asList("Id", "ISBN", "Title", "Authors"));

            GetItemResult result = getClient().getItem(getItemRequest);

            // Check the response.
            System.out.println("Printing item after retrieving it....");
            printItem(result.getItem());            

        }  catch (AmazonServiceException ase) {
                    System.err.println("Failed to retrieve item in " + tableName);
        }   

    }

    private void deleteItems(String tableName) {
        try {
            for (String id : getIdList()) {
                Map<String, ExpectedAttributeValue> expectedValues = new HashMap<String, ExpectedAttributeValue>();
                HashMap<String, AttributeValue> key = new HashMap<String, AttributeValue>();
                key.put("Id", new AttributeValue().withN(id));

                //    can add more expected values if you want

                ReturnValue returnValues = ReturnValue.ALL_OLD;

                DeleteItemRequest deleteItemRequest = new DeleteItemRequest()
                    .withTableName(tableName)
                    .withKey(key)
                    .withExpected(expectedValues)
                    .withReturnValues(returnValues);

                DeleteItemResult result = getClient().deleteItem(deleteItemRequest);

                // if the item was available to be deleted
                if (result.getAttributes() != null) {
                    // Check the response.
                    System.out.println("Printing item that was deleted...");
                    printItem(result.getAttributes());
                }

            }        
        }  catch (AmazonServiceException ase) {
                                System.err.println("Failed to get item after deletion " + tableName);
        } 
    }

    private void waitForTableToBeDeleted(String tableName) {
        System.out.println("Waiting for " + tableName + " while status DELETING...");

        long startTime = System.currentTimeMillis();
        long endTime = startTime + (10 * 60 * 1000);
        while (System.currentTimeMillis() < endTime) {
            try {
                DescribeTableRequest request = new DescribeTableRequest().withTableName(tableName);
                TableDescription tableDescription = getClient().describeTable(request).getTable();
                String tableStatus = tableDescription.getTableStatus();
                System.out.println("  - current state: " + tableStatus);
                if (tableStatus.equals(TableStatus.ACTIVE.toString())) return;
            } catch (ResourceNotFoundException e) {
                System.out.println("Table " + tableName + " is not found. It was deleted.");
                return;
            }
            try {Thread.sleep(1000 * 20);} catch (Exception e) {}
        }
        throw new RuntimeException("Table " + tableName + " was never deleted");
    }

    private void printItem(Map<String, AttributeValue> attributeList) {
        for (Map.Entry<String, AttributeValue> item : attributeList.entrySet()) {
            String attributeName = item.getKey();
            AttributeValue value = item.getValue();
            System.out.println(attributeName + " "
                    + (value.getS() == null ? "" : "S=[" + value.getS() + "]")
                    + (value.getN() == null ? "" : "N=[" + value.getN() + "]")
                    + (value.getB() == null ? "" : "B=[" + value.getB() + "]")
                    + (value.getSS() == null ? "" : "SS=[" + value.getSS() + "]")
                    + (value.getNS() == null ? "" : "NS=[" + value.getNS() + "]")
                    + (value.getBS() == null ? "" : "BS=[" + value.getBS() + "] \n"));
        }
    }

    private void uploadSampleProducts(String tableName) {

        try {
            // Add books.
            Map<String, AttributeValue> item = new HashMap<String, AttributeValue>();
            item.put("Id", new AttributeValue().withN("101"));
            item.put("Title", new AttributeValue().withS("Book 101 Title"));
            item.put("ISBN", new AttributeValue().withS("111-1111111111"));
            item.put("Authors",
                    new AttributeValue().withSS(Arrays.asList("Author1")));
            item.put("Price", new AttributeValue().withN("2"));
            item.put("Dimensions",
                    new AttributeValue().withS("8.5 x 11.0 x 0.5"));
            item.put("PageCount", new AttributeValue().withN("500"));
            item.put("InPublication", new AttributeValue().withN("1"));
            item.put("ProductCategory", new AttributeValue().withS("Book"));

            PutItemRequest itemRequest = new PutItemRequest().withTableName(
                    tableName).withItem(item);
            getClient().putItem(itemRequest);
            item.clear();
            getIdList().add("101");

            item.put("Id", new AttributeValue().withN("102"));
            item.put("Title", new AttributeValue().withS("Book 102 Title"));
            item.put("ISBN", new AttributeValue().withS("222-2222222222"));
            item.put("Authors", new AttributeValue().withSS(Arrays.asList(
                    "Author1", "Author2")));
            item.put("Price", new AttributeValue().withN("20"));
            item.put("Dimensions",
                    new AttributeValue().withS("8.5 x 11.0 x 0.8"));
            item.put("PageCount", new AttributeValue().withN("600"));
            item.put("InPublication", new AttributeValue().withN("1"));
            item.put("ProductCategory", new AttributeValue().withS("Book"));

            itemRequest = new PutItemRequest().withTableName(tableName)
                    .withItem(item);
            getClient().putItem(itemRequest);
            item.clear();
            getIdList().add("102");

            item.put("Id", new AttributeValue().withN("103"));
            item.put("Title", new AttributeValue().withS("Book 103 Title"));
            item.put("ISBN", new AttributeValue().withS("333-3333333333"));
            item.put("Authors", new AttributeValue().withSS(Arrays.asList(
                    "Author1", "Author2")));
            // Intentional. Later we run scan to find price error. Find items >
            // 1000 in price.
            item.put("Price", new AttributeValue().withN("2000"));
            item.put("Dimensions",
                    new AttributeValue().withS("8.5 x 11.0 x 1.5"));
            item.put("PageCount", new AttributeValue().withN("600"));
            item.put("InPublication", new AttributeValue().withN("0"));
            item.put("ProductCategory", new AttributeValue().withS("Book"));

            itemRequest = new PutItemRequest().withTableName(tableName)
                    .withItem(item);
            getClient().putItem(itemRequest);
            item.clear();
            getIdList().add("103");

            // Add bikes.
            item.put("Id", new AttributeValue().withN("201"));
            item.put("Title", new AttributeValue().withS("18-Bike-201")); // Size,
                                                                            // followed
                                                                            // by
                                                                            // some
                                                                            // title.
            item.put("Description",
                    new AttributeValue().withS("201 Description"));
            item.put("BicycleType", new AttributeValue().withS("Road"));
            item.put("Brand", new AttributeValue().withS("Mountain A")); // Trek,
                                                                            // Specialized.
            item.put("Price", new AttributeValue().withN("100"));
            item.put("Gender", new AttributeValue().withS("M")); // Men's
            item.put("Color",
                    new AttributeValue().withSS(Arrays.asList("Red", "Black")));
            item.put("ProductCategory", new AttributeValue().withS("Bicycle"));

            itemRequest = new PutItemRequest().withTableName(tableName)
                    .withItem(item);
            getClient().putItem(itemRequest);
            item.clear();
            getIdList().add("201");

            item.put("Id", new AttributeValue().withN("202"));
            item.put("Title", new AttributeValue().withS("21-Bike-202"));
            item.put("Description",
                    new AttributeValue().withS("202 Description"));
            item.put("BicycleType", new AttributeValue().withS("Road"));
            item.put("Brand", new AttributeValue().withS("Brand-Company A"));
            item.put("Price", new AttributeValue().withN("200"));
            item.put("Gender", new AttributeValue().withS("M"));
            item.put("Color", new AttributeValue().withSS(Arrays.asList(
                    "Green", "Black")));
            item.put("ProductCategory", new AttributeValue().withS("Bicycle"));

            itemRequest = new PutItemRequest().withTableName(tableName)
                    .withItem(item);
            getClient().putItem(itemRequest);
            item.clear();
            getIdList().add("202");

            item.put("Id", new AttributeValue().withN("203"));
            item.put("Title", new AttributeValue().withS("19-Bike-203"));
            item.put("Description",
                    new AttributeValue().withS("203 Description"));
            item.put("BicycleType", new AttributeValue().withS("Road"));
            item.put("Brand", new AttributeValue().withS("Brand-Company B"));
            item.put("Price", new AttributeValue().withN("300"));
            item.put("Gender", new AttributeValue().withS("W")); // Women's
            item.put("Color", new AttributeValue().withSS(Arrays.asList("Red",
                    "Green", "Black")));
            item.put("ProductCategory", new AttributeValue().withS("Bicycle"));

            itemRequest = new PutItemRequest().withTableName(tableName)
                    .withItem(item);
            getClient().putItem(itemRequest);
            item.clear();
            getIdList().add("203");

            item.put("Id", new AttributeValue().withN("204"));
            item.put("Title", new AttributeValue().withS("18-Bike-204"));
            item.put("Description",
                    new AttributeValue().withS("204 Description"));
            item.put("BicycleType", new AttributeValue().withS("Mountain"));
            item.put("Brand", new AttributeValue().withS("Brand-Company B"));
            item.put("Price", new AttributeValue().withN("400"));
            item.put("Gender", new AttributeValue().withS("W"));
            item.put("Color", new AttributeValue().withSS(Arrays.asList("Red")));
            item.put("ProductCategory", new AttributeValue().withS("Bicycle"));

            itemRequest = new PutItemRequest().withTableName(tableName)
                    .withItem(item);
            getClient().putItem(itemRequest);
            item.clear();
            getIdList().add("204");

            item.put("Id", new AttributeValue().withN("205"));
            item.put("Title", new AttributeValue().withS("20-Bike-205"));
            item.put("Description",
                    new AttributeValue().withS("205 Description"));
            item.put("BicycleType", new AttributeValue().withS("Hybrid"));
            item.put("Brand", new AttributeValue().withS("Brand-Company C"));
            item.put("Price", new AttributeValue().withN("500"));
            item.put("Gender", new AttributeValue().withS("B")); // Boy's
            item.put("Color",
                    new AttributeValue().withSS(Arrays.asList("Red", "Black")));
            item.put("ProductCategory", new AttributeValue().withS("Bicycle"));
            getIdList().add("205");

            itemRequest = new PutItemRequest().withTableName(tableName)
                    .withItem(item);
            getClient().putItem(itemRequest);

        } catch (AmazonServiceException ase) {
            System.err.println("Failed to create item in " + tableName + " "
                    + ase);
        }

    }

    private void waitForTableToBecomeAvailable(String tableName) {
        System.out.println("Waiting for " + tableName + " to become ACTIVE...");
        long startTime = System.currentTimeMillis();
        long endTime = startTime + (10 * 60 * 1000);
        while (System.currentTimeMillis() < endTime) {
            try {
                Thread.sleep(1000 * 20);
            } catch (Exception e) {
            }
            try {
                DescribeTableRequest request = new DescribeTableRequest()
                        .withTableName(tableName);
                TableDescription tableDescription = getClient().describeTable(
                        request).getTable();
                String tableStatus = tableDescription.getTableStatus();
                System.out.println("  - current state: " + tableStatus);
                if (tableStatus.equals(TableStatus.ACTIVE.toString()))
                    return;
            } catch (AmazonServiceException ase) {
                if (ase.getErrorCode().equalsIgnoreCase(
                        "ResourceNotFoundException") == false)
                    throw ase;
            }
        }
        throw new RuntimeException("Table " + tableName + " never went active");
    }

    public List<String> getIdList() {
        return idList;
    }    

}

 

The Blog “Initial Thoughts on Amazon S3 and DynamoDB” was originally posted on cherryshoe.blogspot.com

Mule ESB: How to Call the Exact Method You Want on a Spring Bean

June 11th, 2014 by David Milller

The Issue

Mule ESB provides a built-in mechanism to call a Spring bean.  Mule also provides an entry point resolver mechanism to choose the method that should be called on the desired bean.  One such method is the property-entry-point-resolver.  This means the incoming message includes a property that specifies the method name.  It looks like this:

        <component doc:name="Complaint DAO">
            <property-entry-point-resolver property="daoMethod"/>
            <spring-object bean="acmComplaintDao"/>
        </component>

This snippet means the incoming message includes a property “daoMethod”; Mule will invoke the acmComplaintDao bean’s method named by this property.

I’ve had three problems with this approach.  First, you can only specify the bean to be called, and hope Mule chooses the right method to invoke.  Second, Mule is in charge of selecting and providing the method arguments.  Suppose the bean has several overloaded methods with the same name? Third, only an incoming message property can be used to specify the method name.  This means either the client code invoking the Mule flow must provide the method name (undesirable since it makes that code harder to read), or the flow design  must be deformed such that the main flow calls a child flow only in order to provide the method name property.

How I Resolved the Issue

Last week I finally noticed Mule provides access to a bean registry which includes all Spring beans.  And I noticed Mule’s expression component allows you to add arbitrary Mule Expression Language to the flow.  Putting these two together results in much simpler code.  I could replace the above example with something like this:

<expression-component>
     app.registry.acmComplaintDao.save(message.payload);
</expression-component>

The “app.registry” is a built-in object provided by the Mule Expression Language.

In my mind this XML snippet is much more clear and easy to read than the previous one.  At a glance the reader can see which method of which bean is being called with which arguments.    And it fits right into the main flow; no need to setup a separate child flow just to specify the method name.

A nice simple resolution to the issues I had with my earlier approach. And the new code is smaller and easier to read!  Good news all around.

 

Alfresco Records Management – An Approach to Implementation Part 1

June 4th, 2014 by Deja Nichols

Implementing Records and Information Management (RIM) can be a juggling act.

Balancing_Components_Of_RIM_Strategy


Many record managers are faced with several major issues in applying RIM best practices to their businesses.  In this two part blog, I want to go over how I helped implement a seamless records management approach in a mid-sized company using Alfresco, an Enterprise Content Management (ECM) system. There are four general aspects I would like to address in these two blogs:

  1. How we structured our vision with company-wide content policy and standards
  2. How we mapped out what documents we had and the requirements around them
  3. How we configured the system so that it worked for us
  4. How we implemented the solution

Steps for Creating a Records Management Program   In this blog, Part 1, I want to start off with how we got organized in order to leverage records management best practices with Alfresco and the Alfresco Records Management (RM) Module. The objective was to meet our complex records management requirements; gain tighter governance on our unstructured data and content silos; and speed up collaboration efforts within the company.

The Problem

One of the situations we faced, with regards to records management, was the amount of unstructured electronic documentation we had and the numerous retention periods for each type. There were physical records that needed to be ingested into the system as well. On top of that we had several repositories that we needed to “sift through” and migrate to the new system.  In a company such as ours, we have many different types of government programs that called for different documentation and retention for each type of program. So the retention schedule was complex and had to cover hundreds of thousands of documentation. We wanted to be sure we could get a handle on our electronic documentation by implementing a retention schedule but we did not want to make every employee into a Records Manager to do so! We wanted every user to be able to upload their documents quickly and let the system handle the life-cycle from there on out! It sounds too good to be true but it can be done!

The Solution:

First let’s dive into the subject of records management and policy. In order to build a castle you better create the blue-prints first. So to build our “castle” our “blue-print activities” were:

1. Defined business policy and best practices for documents, records and emails

  • This defined what the different types of documentation we had company-wide; what was an “official corporate record” and what was just “miscellaneous documentation.” It also detailed out how we would manage emails, hard copy files and storage.

2. Defined record retention schedule and disposition  policy

  • This defined, per our governing bodies, what the laws were pertaining to the retention of each of our official corporate records, electronic or not. It also detailed how long we, as a company, are going to keep non-records and miscellaneous documentation.

3. Defined a record holds policy

  • This defined what the protocol was for any necessary “legal holds” or “Freezes” on records, and under what circumstances.

4. Defined vital records policy, disaster recovery and offsite storage policy

  • This was a plan that laid out what records were vital for the company, vital to operate on a daily basis. If anything were to happen to these records, this plan laid out a speedy recovery of this information.

In addition to the above activities, for our project, we needed to detail out an additional step: 5. Created a “documents matrix.” This was a simple spreadsheet that tied in all of the RIM and ECM details needed on every document. It listed out every document the company produced and it  included:

  • The name of each document that will be uploaded to the system
  • The document type and any sub document types for those documents/records
  • Document status. (Whether it was an “official record” or not. Not all documents  are “official records” per the retention schedule, so this matrix can help differentiate the two.)
  • Custom metadata that’s needed for each official record (We had at least one “property” – metadata – that was required to be added to a document for retention purposes, which I will go over in more detail later)
  • Originating department and “document owner” for each document/record (those positions in the organization entrusted as the custodian of that record and who had disposition authority over them)
  • Retention schedule (e.g. “Active + 6 years” or “Permanent” or “Current Year +2″ etc.)
  • Criteria for becoming an official record (e.g. “when a contract is approved and signed” etc.)

 

Document Matrix. Please Note- This is an example, all information fields may not apply.

Document Matrix. Please Note- This is an example, all information fields may not apply.

Analyzing and documenting within the spreadsheet simplified the configuration of Alfresco + Alfresco RM plug-in to fit our document collaboration and records management needs into one spreadsheet. By researching and fully mapping out which documents  were official records, which ones were not, what to do with the official records, when to do it etc., allowed our project team to customize a seamless records management approach. Moreover, it was important to know: what we have, where it goes and  when does it goes there. Otherwise, it is hard to meet both content management and records management requirements. After creating these policies the next thing to do was get them out to the people responsible for implementing them. The general employees did not need to know the retention schedule but they did receive the policy on what was a “official record,” what is not, what to do with important emails etc. To do this we held workshops with the executives and then held departmental luncheons to discuss the basic policies and guidelines that pertained to them. We stressed how employees needed to know which documents were important for the company and which ones are “OK to shred.” (AKA, what is an “official record” [has legal ramifications] and what is just a “document” [does not have legal ramifications]) This concept is nothing new and most employees already know their own documents cold. They know not to shred a contract and they know that the company weekly plan, that everyone got a copy, of is theirs to keep or shred.  So learning the new policy was not entirely too daunting.

For full comprehensive tips for starting an ECM implementation project see the Armedia whitepaper on “Creating an ECM Advisory Board and Program Charter” by Ronda Ringo.

In closing: Working on the board of my local ARMA Chapter (Association of Records Managers and Administrators) for several years, I’ve noticed companies were having difficulties implementing records management because they were trying to teach their individual employees records management. This puts all the stress and obligation on an individual employee and adds a considerable amount of time and training. This can be a lot of hours lost that can be used for better investments. All employees should not have to become expert Records Managers to be able to manage their electronic content. Setting it up in this way, the only thing our employees need to know was what they already knew, basic details about their documents, and the system does the rest!

 

In my next blog “Alfresco Records Management; An Approach to Implementation – PART 2” I will go over how we used Alfresco and implemented this seamless records management approach.

Copyright © 2002–2011, Armedia. All Rights Reserved.