Enterprise Content Management

Armedia Blog

U. S. Government Digital Acquisition Policy Gets an Update

September 11th, 2014 by Scott Roth

You may have seen the news that the U. S. Government has established the U.S. Digital Service, a small team designed to “to improve and simplify the digital experience that people and businesses have with their government.” On the heels of that announcement came the news that Michael Dickerson, former Google engineer, has been selected to head up the U. S. Digital Service. And, in conjunction with these announcements, came some initial updates to the U. S. Government’s acquisition policies as they relate to software and computing solutions. It is these updates I would like to highlight in this post.

These initial updates come in the form of two documents, The Digital Services Playbook and the TechFAR , which really go hand-in-hand. The Playbook lays out best practices for creating digital services in the government, and the TechFAR describes how these services can be acquired within the confines of existing acquisition policy (i.e., the FAR ). The Playbook discusses 13 “plays”, or best practices that should be implemented to ensure delivery of quality applications, websites, mobile apps, etc., that meet the needs of the people and government agencies.  Advocating and implementing these plays will be the Digital Services’ mission.  As a long-time provider of software development services, I wasn’t too surprised by any of these best practices – and neither will you. However, it was refreshing to see the government finally embrace and advocate them.  Here are the Digital Services Playbook plays.

  1. Understand what people need
  2. Address the whole experience, from start to finish
  3. Make it simple and intuitive
  4. Build the service using agile and iterative practices
  5. Structure budgets and contracts to support delivery
  6. Assign one leader and hold that person accountable
  7. Bring in experienced teams
  8. Choose a modern technology stack
  9. Deploy in a flexible hosting environment
  10.  Automate testing and deployments
  11. Manage security and privacy through reusable processes
  12. Use data to drive decisions
  13. Default to open

Like I said, you probably weren’t surprised by these practices, in fact, if you are a successful software services company, you probably already implement these practices. But remember, these practices are now being embraced by the U. S. Government, whose acquisition policy has traditionally been geared more toward building battleships than software solutions.

Speaking of acquisition, the TechFAR is a handbook that supplements the Federal Acquisition Regulations (FAR). The FAR is a strict and lengthy body of regulations all executive branch agencies must follow to acquire goods and services. The Handbook is a series of questions, answers, and examples designed to help the U. S. Government produce solicitations for digital services that embrace the 13 plays in the Digital Services Playbook. At first glance, you may not think that implementing these practices would require a supplement like the Handbook, but if you have any experience with the FAR, or agencies who follow it, you will understand that interpretation and implementation of the regulations varies from agency to agency, and they usually error on the side of caution (i.e., strict interpretation of the policy).

In my experience, the single most difficult thing for a U. S. Government agency to accomplish under the FAR is play #4, the use of agile methodologies to develop software solutions. If you can accomplish this, many of the other plays will happen naturally (e.g., #1, #2, #3, #6, #7, #10). However, the nature of agile development – user stories vs. full system requirements, heavy customer participation vs. just follow the project plan, etc. – seems contrary to the “big design” methodology implied by the FAR. This notion couldn’t be more wrong. The TechFAR encourages the use of agile methodologies and illustrates how solicitations and contracts can be structured to be more agile.

Personally, I think the Digital Services Playbook and the TechFAR are a great starting point for improving the quality and success of government software solutions.  And, official guidance like this now brings the U. S. Government’s acquisition process inline with how Armedia has always developed software solutions, i.e., using agile methodology.  No longer will we have to map our methodology and deliverables to an archaic waterfall methodology to satisfy FAR requirements.

I think the questions/answers/examples in the TechFAR are good, and provide terrific insight for both the government writing solicitations, and industry responding to them. If you sell digital services to the U. S. Government, I encourage you to read these two documents, the Digital Services Playbook and the TechFAR  — they’re not long. And even if you don’t contract with the U. S. Government, the best practices in the Playbook and the advice in the Handbook are still probably applicable to your business.

WordPress Contributors Upload Plugins

August 7th, 2014 by Paul Combs

My previous post, “Allow WordPress Users to Upload Images” discussed the use of the functions.php file to implement capabilities to the contributors role that isn’t there by original design. As the functions.php file is part of a WordPress theme, and if an alternate theme is selected, the functions will no longer be accessible unless the functions.php file is edited in that theme as well. With the use of plugins, however, the functions will remain.

This function is slightly different than many others, in that it is a persistant change. So even if the function is not enabled as a plugin or removed from a functions.php the change will remain until it is explicitly revoked. Two plugins are needed.

This plugin enables the capability of the contributor role to upload content along with their post. Once enabled the action takes effect, even if it is then disabled. Hence, the reason for the next plugin.

<?php
 /*
 Plugin Name: Armedia: Contributor Role Upload Enabler
 Description: Adds the capability to the contributor role to upload content. This change is persistent until it is explicitly revoked. Based on the source by Hardeep Asrani.
 Author: Paul Combs
 Version: 1.0
 Author URI: http://www.armedia.com
 */

function allow_contributor_uploads() {
 if ( current_user_can( 'contributor' ) && ! current_user_can( 'upload_files' )) {
 $contributor = get_role('contributor');
 $contributor->add_cap('upload_files');
 }
 }

add_action('admin_init', 'allow_contributor_uploads');
 ?>

This plugin removes the capability of the contributor role to upload content. Once enabled the action takes effect, even if it is then disabled.

<?php
 /*
 Plugin Name: Armedia: Contributor Role Upload Disabler
 Description: Removes the capability to the contributor role to upload content. This change is persistent until it is explicitly revoked.
 Author: Paul Combs
 Version: 1.0
 Author URI: http://www.armedia.com
 */

function remove_contributor_uploads() {
 if ( current_user_can( 'contributor' ) && current_user_can( 'upload_files' )) {
 $contributor = get_role('contributor');
 $contributor->remove_cap('upload_files');
 }
 }

add_action('admin_init', 'remove_contributor_uploads');

?>

There are no checks and balances here, so it should be noted that if both are enabled the results will not be as expected. A quick test of refreshing a contributor screen with both plugins enabled will reveal that the capability is available every other refresh. For the expected result, select one or the other.

Allow WordPress Contributors to Upload Images

August 5th, 2014 by Paul Combs

WordPress offers six different roles ranging from Super Admin to Subscriber. There is one role that permits a user to write and manage their own posts but cannot publish them; that is the contributor. Writing a post and submitting it for approval to publish without images is as easy as it gets. Posts of that nature are rare and can get a little boring. Images help make a post more interesting. However, as a contributor, images cannot be uploaded with the post. A number of work-a-rounds may be put into place to remedy this, but each can be time consuming and often repeated effort. This may not be so bad for a short post, but one with many images can be more challenging than the effort is worth.

To allow contributors to upload images with posts would greatly simplify this. One site offers a snippet of code to add to the theme’s functions.php file.

if ( current_user_can('contributor') && !current_user_can('upload_files') )
add_action('admin_init', 'allow_contributor_uploads');

function allow_contributor_uploads() {
$contributor = get_role('contributor');
$contributor->add_cap('upload_files');
}

This has been tested as working using WordPress 3.9.1. Here is an after and before screenshot of the admin board of a contributor. Notice the image on the left has a Media option.

contrib

A contributor may now upload their post with images ready for someone else to Publish. After a successful upload, from the Media option take note of a couple of differences between images submitted by the contributor and images submitted by others.

This is an image submitted by anyone other that the contributor. Notice that the contributor may only view the image and no other action may be taken.

image-not-contributor

This image has a box next to it to allow for bulk actions. The contributor may also Edit or Delete Permanently their own image as well as view it.

image-contributor

A second contributor account was created to verify that the another contributor may only view other contributor images as well as any other image and may perform other actions on own images. The results were as expected.

It is important to note that even if the code is removed from the functions.php file, the contributor role will still have the capability to upload content. This capability is persistent until explicitly revoked. The setting is saved to the database. To explicitly revoke this capability simply reverse the action editing the code above and append to the functions.php file.

if ( current_user_can('contributor') && current_user_can('upload_files') )
add_action('admin_init', 'remove_contributor_uploads');

function remove_contributor_uploads() {
$contributor = get_role('contributor');
$contributor->remove_cap('upload_files');
}

Although the functions.php may be modified with either of the pieces of code provided above, a cleaner and more portable method would be the use of custom plugins. One plugin to enable uploads and another to disable them. That could be the topic of my next article…

Source(s)
http://codex.wordpress.org/Roles_and_Capabilities
http://www.trickspanda.com/2014/01/allow-contributors-upload-images-wordpress/
http://codex.wordpress.org/Function_Reference/add_cap

How to Export Tabluar Data in Captiva 7

July 29th, 2014 by Scott Roth

Armedia has a customer using Captiva 7 to automatically capture tabular information from scanned documents. They wanted to export the tabular data to a CSV file to be analyzed in Excel. Capturing the tabular data in Captiva Desktop proved to be simple enough, the challenge was in exporting it in the desired format.  Our customer wanted each batch to create its own CSV file, and that file needed to contain a combination of fielded and tabular data expressed as comma delimited rows.

Here is an example of one of the scanned documents with the desired data elements highlighted.

Timecard-Scan-Metadata-Highlighted

Here is an example of the desired output.

EMPLOYEE,EID,DATE,REG HRS,OT HRS,TOT HRS
ANDREW MARSH,084224,4/22/2013,7,0,7
ANDREW MARSH,084224,4/23/2013,7.5,1,7.5
ANDREW MARSH,084224,4/24/2013,4,0,9
ANDREW MARSH,084224,4/25/2013,8.5,0,8.5
ANDREW MARSH,084224,4/26/2013,12,0,12
BARB ACKEW,084220,4/22/2013,7,0,7
BARB ACKEW,084220,4/23/2013,9.5,0,9.5
BARB ACKEW,084220,4/24/2013,9.5,0,9.5
BARB ACKEW,084220,4/25/2013,2.5,0,2.5
BARB ACKEW,084220,4/26/2013,8,.5,8

As you can see, the single fields of Employee Name and Employee Number are repeated on each row of the output.  However, because Employee Name and Employee Number were not captured as part of the tabular data on the document, this export format proved to be a challenge.

Here’s what I did:

  1. In the Document Type definition, I created fields for the values I wanted to capture and export (Name, EmployeeNbr, Date, RegHrs, OTHrs, TotHrs).  Here’s how it looks in the Document Type editor:

CC7-doctype

  1. In the Desktop configuration, I configured:
    • Output IA Values Destination: Desktop
    • Output dynamic Values: checked
    • Output Array Fields: Value Per Array Field
  2. Finally, I created a Standard Export profile that output the captured fields as a text file, not a CSV file. I named the file with a “CSV” extension so Excel could easily open it, but to create the required output format, the file had to be written as a text file.  Here is what the Text File export profile looks like:

CC7-export

The content of the Text file export profile is:

EMPLOYEE,EID,DATA,DATE,REG HRS, OT HRS, TOT HRS
---- Start repeat for each level 1 node ----
---- Start repeat for each row of table: Desktop:1.UimData.Hours ----
{S|Desktop:1.UimData.Name},{S|Desktop:1.UimData.EmployeeNbr},{S|Desktop:1.UimData.Date},{S|Desktop:1.UimData.RegHrs},{S|Desktop:1.UimData.OTHrs},{S|Desktop:1.UimData.TotHrs}
---- End repeat ----
---- End repeat ----

By using two nested loops I was able to access the non-tabular fields, Name and EmployeeNbr, as well as the tabular fields in the same output statement.  This looping feature of the Text File export profile saved having to write a CaptureFlow script to iterate through all the table variables and concatenate Strings for export.  A nice feature, but not well documented.

Good Times With VirtualBox Networking

July 24th, 2014 by David Milller

TL;DR version: if you run multiple VirtualBox VMs on the same desktop, setup 3 network interfaces on each such VM (one NAT, one internal, one bridged).

Now for the long, more entertaining (hopefully!) version:

Recently I switched from VMware Workstation to Oracle VirtualBox for my personal virtualization needs.   I’m very happy overall. VirtualBox seems faster to me – when I minimize a VM, do lots of other work, then restore the VM, it is responsive right away; where vmWare would page for a minute or two.  And each VirtualBox VM is in a separate host window, which I like more than VMware’s single tabbed window.

Still, I must say VMware’s networking was easier to deal with.  Here’s how I ended up with 3 IP addresses in each of my local VMs…

I have a CentOS VM running Alfresco and Oracle; a Fedora VM running Apache SOLR and IntelliJ IDEA; and a Windows 2012 Server VM running Active Directory.  I need connectivity to each of them from my host desktop (Windows 8.1), and they need connectivity to each other, and they need to be able to connect to Armedia’s corporate VMs.  Plus,  I’d rather not update my hosts file or IP settings every time  I move between the office and home!

1st VirtualBox network: a Network Address Translation (NAT network) which allows each VM to talk to the other VMs, but not to any other machine; and does not allow connection from the host desktop.  This meets Goal #2 (connectivity to each other).  But Goals #1 and #3 are not met yet.

2nd VirtualBox network: a VirtualBox Host-Only network which allows connectivity from the host desktop.  Now Goals #1 (connectivity from the host) and #2 (connectivity to each other) are just fine.

Also, both the NAT and the host-only network offer stable IP addresses; whether at home or at work, my VM’s get the same address each time, so I don’t spend 10 minutes updating IP references every time I switch location.

Danger!  Here is where VirtualBox tricks you!  It seems like Goal #3 (access to corporate VMs) is met too!  With the NAT and internal IP addresses, I can see our internal websites and copy smaller files to and from the data center VMs.  But if I transfer a larger file, I get a Connection Reset error!  Twice in the last month, I’ve spent hours tracking down the “defect” in the corporate network settings.  (You’d think I’d remember the problem the second time around; but in my defense the error manifested in different ways).

Solution?  Add the 3rd VirtualBox network: a bridged network (i.e. bridged to your physical network adapter, so this network causes each VM to have an IP address just like the host gets, from the corporate/home DHCP server): Now the 3rd goal is really met!  I can transfer files all day long, no worries.

Something to watch out for: when you disconnect a wired ethernet cable, VirtualBox automatically changes the bridged network to bind to your wireless interface.  This is nice since your VMs automatically get new addresses.  BUT! When you plug in the ethernet again (which in my case deactivates the wireless), VMware does NOT switch back to the wired interface!  That happened to me this morning.  Spent a few hours trying to figure out why my file uploads failed.  Finally saw where VirtualBox re-bound my bridged network.  Changed it back to the wired interface, and all was well.

Alfresco Records Management: An Approach to Implementation Part II

July 22nd, 2014 by Deja Nichols

In the 1st part of this blog “Alfresco Records Management; An Approach to Implementation – PART 1,” I went over the business case and planning phase for a medium sized agency that wanted a seamless records management configuration, leveraging Alfresco’ s Enterprise Content Management (ECM) system and Records Management (RM) Module.

To figure out how we wanted to go about design and implementation and how to configure the system properly, we need to get an idea of the basic lifecycle of our documents and records. We needed to see where we were going. To build a castle, you need to know how much total space, land you need etc. What are all the materials you need? What is it going to cost?  Even if it’s just a general idea, it’s best to map out what you want, what is required for the whole project first. You can’t just start out with one room of a castle and “see where it takes you.” I have personally seen that it is the same with building ECM and RM systems. Different documents can have different life cycles but here is a general example for a possible lifecycle for an HR Complaint:

HR_Document_Lifecycle_Example

 

 

In this blog, Part 2, I’m going to go over our last two general aspects, how we set up and implemented Alfresco in order to accomplish our ideal records management configuration:

  • Configuration
  • Implementation

Steps for Creating a Records Management Program

 

In order to best describe our configuration and implementation phase, I want to go over some very basic aspects of how things were set up in Alfresco. Although we had an older version of Alfresco, most of this was out of the box with little configuration. So here’s the basic aspects that we created in Alfresco that was important to the layout of the system:

  1. Group sites
  2. Document library within each group site
  3. Document types
  4. Metadata
  5. Record Declaration
  6. Seamless User Interaction
  7. Records Management (RM) Module  (aka RM repository or File Plan)
  8. Workflows for certain documents and records

Alfresco Group Sites Example

 

 

Alfresco Group sites:

To break it down let’s start with the basic structure of our company. Like most companies, we have a hierarchical structure, about seven different departments and about 200 employees. Every employee belongs to one department, so we set up each department with a “group site” in Alfresco. (Human Resources Site, Finance Site, Legal Site etc…this is an out of the box feature in Alfresco)

Alfresco Document Library:

Each department group site has its own file repository. In Alfresco it is called a “Document Library,” which, per our records policies, was deemed to be the single-source repository for all of that departments’ electronic documents.

Document types:

Each document library can be set up with a unique set of “Document Types” to categorize documents into your file taxonomy. They can also be unique per group sites’ document library. (For example, the Human Resources document library may have “Employee Contracts” and “Resumes” as two possible document types, but Finance may have “Vendor Contracts” and “Invoices” etc.)

The idea was that, upon an employee uploading a document to their department’s document library, they were prompted to select a document type. You can also set up a sub-document type, if that was necessary per the retention schedule or file taxonomy.

Metadata:

Then, we configured the system to require the user to enter in any applicable metadata for the document they are uploading. (As required by our documents matrix in PART 1). Some of our documents needed to have extra properties (metadata) to help with mapping it to the correct location for retention purposes. Example, for a document type “Resume” we wanted to add the following metadata: “Name of Employee” so that the system knew which records management folder to put it in (which I will go over in more detail later in this blog). Each record uploaded typically only needed one extra piece of information to correctly categorize it for records management purposes.

Alfresco Records Management Home Page

 

Record Declaration

The last step when uploading a document that we configured the system to be able to handle was to declare the document an official record, which only showed up on document types that had a retention policy associated with it. If they chose a document type that was predetermined to be a record (as opposed to a non-record), then the user was given the option to choose whether or not the document they were uploading should be declared as an official record. What this means is that if this document was “complete” and ready as an “official record,” then checking that box would immediately declare it an official record upon ingestion. But if the document was still a “work in progress,” and not yet an official “record,” the user could simply leave the box unchecked and declare the document an official record at a later date, when it is fully completed. (Example: If the document type is a “Contract,” and it is still being worked on when it was uploaded, they would not check the “declare record” box, upon ingestion. But, then after some time, when that contract gets officially signed, they can declare it a record at that time).

For us, the word “Complete” was defined per document by the document matrix. Basically it is “when a document is considered complete” and/or “at which point it becomes official record.” For us, one example of the declaration criteria was: “After the document (in this case, a contract) was officially approved AND all stakeholders had signed-off on it, it could be declared a record.” For some records this was not applicable, such as articles of incorporation, bills, financial statements etc. These were automatically official records upon ingestion and immediately sent to the RM module for retention regardless, since they were un-editable documents. So obviously anything of that document type was given the option to “declare it a record,” and it was automatically declared after ingestion.

Process for Declaring a Record

 

In most cases, we found the user usually knows what they are uploading. They usually upload their own work into the system and they usually know if that work is still “in progress” or “complete” etc.  We also found that it was not even necessary to teach users the document matrix because most of them knew what they were working on like the back of their hand. Thus, this method worked for us and we did not have to turn our end users into Records Managers! They only needed to know 3 basic things:

1. What the document type was (invoice, contract, financial report)

2. What the metadata was (date of document or name of employee, etc., usually only one piece of extra information was needed)

3. Is it still being edited or otherwise worked on, or can it be declared a record now?

Seamless User Interaction:

We wanted our users to be able to see the records in their own context. What I mean is, we did’t want them having to go look for their documents in 2 places. We did’t want them to have to worry or even know about the RM module in Alfresco. All they needed to know was that they upload documents to their group site and the RM works behind the scenes. So we set it up so that if they are searching for documents on their group site, documents that were sent to the RM module (from that site) also show up in the search results. They can open it and view it, collaborate on it without ever leaving their group site. (You can also set up a visual indicator on each document as an “official record” so you can tell which ones have been sent.)

Alfresco Records Management In Place Records Declaration

 

Alfresco Records Management Module:

When someone sets up the Alfresco RM module it will allow them to create folders in what is called a “File Plan.” Then the Records Manager can set retention rules (that coincide with the retention schedule) on those folders. From there, the documents can be mapped to the File Plan folders (using the documents matrix as a guide) when a document type and metadata combo is placed on a document and then declared a record. That File Plan folder runs the retention on the documents from there.

Alfresco Records Management Module File Structure Example

 

When the user selects “Yes” to the question “declare official record?” (whether upon ingestion or later declared), it tells the system that this file can now be sent directly to the File Plan in the Alfresco RM Module.

Example:

Now let’s take a look at a practical example of how a file gets uploaded into the system and ends up in the file plan and retention policies applied to it. (The characters indicated in green will be our input variables.)

Actor: Wants to upload an old invoice they found into the Finance group site. First enters in the document type: “Invoice.” Next pops up, because “invoice” was selected, the required metadata for the document type “Invoice” which was configured to be “Year.” Actor enters in a year: “2007.” Since “Invoice” doc type has a retention period connected to it, per configuration, the actor sees the checkbox up for “Official Record?.” Actor chooses to “check” the box which = true (or Yes).  Computer: from this input, knows exactly:

  • Where to put this file (was configured to : RM Site/File Plan/Finance/Invoices/2007)
  • When to put it there (was configured to : “Declare Official Record?” yes = immediately an official record = sent to file plan immediately)
  • How long it stays there (Document was placed in the Invoice folder. This file will keep records for current year + 6 years per the rules placed on it by the Records Manager. Also since it was placed in the “2007” folder, the system knows when to start the retention) [current year + 2007 + 6 = 2014]…this document is discarded on Jan 2014.)

 

Alfresco Records Management- Uploading a Document

 

If a document is not ready to be declared an official record upon ingestion (if you are still editing a contract for example) then one can keep it in the system and declare it a record when it is ready/complete/approved etc.

Typical_RM_Lifecycle

 

This diagram (above) shows the flow of a typical record lifecycle. Flow: upload, assign doc type and metadata, if not yet record – edit/collaborate, later declare record, retain and discard (if applicable), etc. Upon upload, the document has its entire life already mapped out for it, depending on the configuration of the document types, metadata and file plan. (Please note, there are some official records that are never discarded and have a “Permanent” retention. The file plan can accommodate for these types of files as well and the above model would need to be slightly modified to account for that. You don’t even need to ask if it’s an official record or not on permanent records as discussed earlier in this blog.)

Workflows:

Another popular option when declaring a record is to put the document through a workflow that takes it through an approval process, and once the document gets approved via the workflow, it automatically declares itself a record. The system, from that point on, knows all the information it needs in order to retain it, and if applicable, dispose of it per policy.

The creation of workflows is an important step to know what kind of workflows your company needs for records and documents/files, etc. This may be intimately connected to the lifecycle and management of your records so I suggest keeping it in mind when mapping out your system. For more information from Armedia about Workflows in Alfresco, see our Blog Subsection on Alfresco Workflows.

In closing:

This approach is primarily a “day forward” solution. When it comes to migration, there may need to be a different approach so files can be ingested into the new system and arrive at the  correct location within Alfresco.

Also I would like to note that this approach might work best with a customized user interface for more flexibility.

There are many different ways to go about implementing Records Management, and companies need flexible customization that will work for their business processes and records management needs. This method can help you get started on your own configuration and implementation.

For more information on Records management, check out our white paper: “Records Management: An Approach to Getting Started”

To read more Armedia Blogs about Alfresco, See these links: Alfresco Records Management, Alfresco ECM

Exciting News for Blind or Visually Impaired People Using SharePoint

July 17th, 2014 by Doug Loo

AFB Consulting (AFBC), the consulting division of the American Foundation for the Blind (AFB), conducted a comprehensive accessibility and usability evaluation of the Discover 508 for SharePoint software from Discover Technologies of Reston VA. The product evaluation compared Out of the Box (OOTB) SharePoint in SP 2010 and SP O-365 with Discover 508 for SharePoint, a software solution designed to make SharePoint easier to use. AFBC tested and compared these products based on how well they interact with various screen reader software products used by people with vision loss to access Windows computers.

Testing results clearly illustrate that Discover 508 has significant usability advantages over the out of the box experience in both environments. The usability and accessibility advantages allow a blind or visually impaired person to complete tasks much more easily and quickly than in the OOTB environment. The superior usability comes largely from a more intuitive, well-designed architecture that is easier to navigate and more suitable for efficiently accomplishing tasks. It also lacks inaccessible pop-ups and other features designed with only sighted users in mind. A clear and easy to use set of instructions provided by Discover 508 is another significant advantage over the OOTB experience, providing step-by-step guidance and allowing a beginner user to learn how to use the system.

Discover 508 provides an environment that is more suitable for use with screen readers, with a markup that includes among other things, properly coded headings, properly labeled links and form elements, and properly formatted and tagged tables. Properly tagged headings allow a person using a screen reader to quickly navigate to the headings that indicate the important sections of the page, and it also allows screen reader users to get a better concept of the overall layout and logical hierarchy of the page. Discover 508’s properly labeled form elements let screen reader users determine things like a particular type of edit field, such as Document Title or Date. They also help make combo boxes, check boxes, and radio buttons easier to use. Discover 508 also avoids the use of poorly formatted and tagged tables experienced in the OOTB environment.

With the Discover 508 for SharePoint solution, AFB testers found it substantially easier to manage calendar events, upload and edit documents and collaborate with team members. Time spent learning the system and completing individual tasks was significantly shorter when using Discover 508.  While SharePoint has made progress with their “More Accessible” mode, Discover 508 clearly stands out as the more accessibility usable solution.   Although testers could eventually complete most tasks  attempted in the out of the box environment, there were some inaccessible tasks that  could not be completed without sighted assistance. The level of frustration and confusion was also significant. For example, simply changing the name of a document took nearly 20 minutes in an initial attempt. Adding a folder to a document library is an example of the difficult and sometimes illogical nature of the OOTB experience. Rather than beginning the process with something intuitive like a “New Folder” or “Add Folder” link, the user first has to activate a “New Document” link. AFBC usability testers spent nearly 40 minutes trying to determine how to create a document library, including time spent with SharePoint’s online help instructions, some of which were helpful and some of which were not. The instructions that did help get the job done said to go to “Settings” and then “Add an App,” which obviously lacks a logical or intuitive path.

Discover 508 for SharePoint avoids all that difficulty and confusion. The experience with the Discover 508 solution was much more intuitive and streamlined, giving a person with vision loss the ability to complete each task as effectively and efficiently as his or her sighted peers. This is extremely important in today’s competitive job market, giving people with vision loss the ability to compete on an even playing field with their sighted peers.

ACM: Introduction to Data Access Control

July 16th, 2014 by David Milller

Background

Armedia Case Management (ACM) is a framework for developing case management applications.

Data Access Control ensures each user sees only records they are authorized to see, and are prevented from seeing unauthorized records.  Data access control is applied to individual business objects, as opposed to role-based access control, which is only based on the user identity.

Role-based access is usually applied to each URL in a REST-oriented application.  It ensures the user is authorized for that URL; for example, that only document approvers can invoke the “/approveDocument” URL.  But role-based access by itself means any document approver can approve any document.

Spring Security easily integrates with Spring MVC to enable URL-based access control.  How can we easily add the logic to ensure that only Document A’s approver can approve Document A, and only Document B’s approver can approve Document B?  Not to mention ensuring that, until the document is approved, only users involved in the draft and approval process can even see it – so that it does not appear in anyone else’s search results or queries?

Straightforward Custom Applications

If the application is written for a single customer and implements a straightforward process with the same rules for everyone, the easy path is to build these controls into the application code.  Embed the rules in the query logic such that the database only returns appropriate rows; add checks in the document approval logic to ensure the user is on the approver list for that document.

If you design and build this application very carefully, and you understand the customer very well, and their requirements do not change faster than you can update the application; then this approach can work.  I’ve written many such applications and they were very successful; the users were happy, and all was well.

Larger, More Diverse Customers

The larger the customer organization, and the more departments and geographic regions being served, the harder it gets to implement data access logic in code.  I tried this approach for a large government agency where each region had slightly different rules.  The implementation got pretty difficult.  The queries become long, and the results came back much slower; the database isn’t really meant to apply sophisticated access control rules over very large data sets.  Let’s just say the customer was less happy.

Many Different Customers with Different Problem Domains

Let’s just extend the previous scenario to the ACM arena, where the framework has to satisfy entirely different customers, each of whom has radically different rules and even different types of business objects.  Now the straightforward approach of implementing access logic in code amounts to a commitment to rewrite the entire application for each customer.  Now my Armedia leadership team (the people who sign my paychecks!) are less happy!

The ACM Solution

In ACM, we have a flexible solution.  We have a fast way to implement the most common types of rules and we also provide a mechanism to implement more complicated rules.  And we have a fast, reliable way to evaluate the data access controls at runtime.

My next few blog posts will explore this solution in more detail.

In a nutshell, ACM requires the results of all data access control rules to be embodied in a database table.  Not the rules themselves; but the results of each rule as applied to each domain/business object.  This table holds all the access rules for each individual domain/business object.  Each domain/business object’s rules are also indexed in the Apache SOLR search engine.  This allows ACM to encode the current user’s access rights (group membership and any other access tokens) as a set of boolean restrictions in the SOLR search query.  SOLR is designed to efficiently evaluate multiple boolean search conditions.    This gives us fast search results including only domain/business objects the user is allowed to see.

More to come – stay tuned!

Click Here to See all Blogs about Armedia Case Management

 

Security and User Access Control in Ephesoft 3.1

June 30th, 2014 by Chae Kim

 

Overview of Ephesoft Security

With an introduction of Single Sign-On (SSO) and other great new features, such as new Extraction (Fuzzy Key Field, Zone Extraction, Regular Expression Builder) and Classification (Test Classification Tool, Advanced DA Switch, Regex Classification Switch) Features, Application Level Scripting, Email Batch Processing, and RecoStar TIFF Conversion, for the new Ephesoft version 3.1, Ephesoft is now more enterprise-ready than ever before. While working with the Ephesoft SSO and user access control implementations for my client, I had a chance to explore Ephesoft’s adherence to CIA principles – Confidentiality, Integrity, and Availability of information.

Also known as the CIA triad, CIA is a fundamental concept in information security and often used as a guideline for information security policies within an organization. Confidentiality refers to limiting information access and disclosure to authorized users and preventing access by unauthorized users, therefore protecting information from any unauthorized access. Integrity refers to the accuracy and trustworthiness of information and protecting information from unauthorized modification. Finally, Availability refers to a guarantee of access to the information by authorized users.

In this blog, I wanted to concentrate on the “Confidentiality” of Ephesoft. The following are Ephesoft features that ensure confidentiality of the Ephesoft document capture system:

  • SSO through HTTP(S) header request – User-based authentication, which ensures data confidentiality, is controlled at the organization level.
  • Integration with secure MS Active Directory or LDAP – In addition to the user authentication, user authorization can be provided based on roles configured with secure MS Active Directory or LDAP server.
  • Role-based user access control – User access control ensures that only users with valid roles can access different areas of the application and the information of Batch Classes and batch instances that are only intended for right people. Following are examples of role based  user access control:
  1. Security Constraint
  2. Batch Class Access
  3. Batch Instance Access

Ephesoft 3.1 Product Documentation available on the Ephesoft website provides detailed information on the SSO and User Management using MS-AD, LDAP, and Tomcat. Please refer to This Link for more information.

Examples of User Access Control

In addition to the Product Documentation, we can further explore the examples of the role-based user access control here.

Security Constraint

Role-based application access control let you limit access to the Ephesoft User Interfaces and following table shows UI represented as web resource and suggested role type for each resource.

Ephesoft_User_Role_Types

 

If needed, the role types can be more specialized, such as Scan Operator, Review/Validate Operator, etc. The following table shows an example of specialized role and web resources accessed by each role type.

Ephesoft_Specialized_Role_Types

 

Below is an example of the “batch list” security constraint configured in <Ephesoft Installation Path>\Application\WEB-INF\web.xml.

	<security-constraint>
		<web-resource-collection>
			<web-resource-name>batch list</web-resource-name>
			<url-pattern>/ReviewValidate.html</url-pattern>
			<http-method>GET</http-method>
			<http-method>POST</http-method>
		</web-resource-collection>
		<auth-constraint>
			<role-name>ReviewValidate Operator</role-name>
<role-name>Scan Operator</role-name>
		</auth-constraint>
	</security-constraint>

*Please note that for Ephesoft 3.1, if SSO is configured to be in use, the security constraints need to be commented out in web.xml because the security constraint in conjunction with SSO is not fully developed yet. Ephesoft expects to provide the security constraints fully compatible with SSO in the next major patch release.

 

Batch Class Access

Ephesoft made it very simple to apply role based access to Batch Classes and batch instances. You can simply navigate to the Batch Class Configuration section and pick the role you want as shown below.

Ephesoft_Role_Window

 

Each Batch Class can be configured with available user role(s), so only the users that belong to such role(s) can access the Batch Class and batches created based on the Batch Class. This Batch Class user access control can be very useful in providing variable scan processing depending on unique group or departmental usages within a large organization. Ephesoft can be shared by multiple departments within an organization, but each department sees the Batch Class and batch instances that are only relevant to the department.

Batch Instance Access

It is a common practice for multiple groups or departments within an organization to share single Ephesoft system and if processes are different, each department can have a separate Batch Class to handle different scanning needs as explain in the Batch Class Access section. However, maintaining multiple Batch Classes for the same process can be difficult to maintain. If single Batch Class needs to be shared by multiple departments, utilizing the batch instance group feature can provide a customized view of the batch list with dynamic assignment of a user role to each batch instance.

The Batch Instance Group feature, using one the Ephesoft application database tables, batch_instance_group, allows you to assign group name to each batch instance through simple custom scripting. The method below is an example, which was developed based on an example script, ScriptDocumentAssembler_BatchInstanceGrouFeature.java, which is available to download from the Ephesoft Script Guide

//get the batch ID from batch.xml	
	Element root = documentFile.getRootElement();
	Element batchInstanceID = (Element)root.getChild(BATCH_INSTANCE_ID);
	if (batchInstanceID == null) {
		return;
	}
	batchID = batchInstanceID.getValue();
		
	if (batchID != null && batchGroup != null) {
		//Retrieve DB info from dcma-db.properties file
		Properties prop = new Properties();
		String pathHome = System.getenv("DCMA_HOME");
		String pathProp = "WEB-INF/classes/META-INF/dcma-data-access/dcma-db.properties";
		File propFile = new File (pathHome, pathProp);
		InputStream input = null;
		try {
			input = new FileInputStream(propFile);
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		}
		try {
			prop.load(input);
		} catch (IOException e) {
			e.printStackTrace();
	} finally {
	if (input != null) {
	try { input.close(); } catch (IOException e) { }
	}
		}
		
		//get URL, username and password to make DB connection
		String username = (String) prop.get("dataSource.username");
		String password = (String) prop.get("dataSource.password");
		String driverClassName = (String) prop.get("dataSource.driverClassName");
		String databaseName = (String) prop.get("dataSource.databaseName");
		String serverName = (String) prop.get("dataSource.serverName");
		String url = (String) prop.get("dataSource.url");
	
		url = url.replace("${dataSource.serverName}", serverName);
		url = url.replace("${dataSource.databaseName}", databaseName);
			
		//Execute SQL update to assign group name to the batch_instance_groups table
		Connection conn = null;
		Statement stmt = null;
		try {
			Class.forName(driverClassName).newInstance();		
			conn = DriverManager.getConnection(url, username, password);
		
			String sqlInsert = "insert into batch_instance_groups(creation_date, last_modified, batch_instance_id, group_name) VALUES (Now(), Now(), '" + batchID + "', '" + batchGroup + "')";			
			stmt = conn.createStatement();
			stmt.executeUpdate(sqlInsert);
				
		} catch (Exception e) {
			e.printStackTrace();
	} finally {
	if (stmt != null) {
	try { stmt.close(); } catch (SQLException e) { }
	}
	if (conn != null) {
	try { conn.close(); } catch (SQLException e) { }
	}
		}
	} else {
		System.err.println("Cannot assign Batch Instance Group - missing Batch ID and/or User Group.");
	}
}

 

Depending on the batch group assignment logic of your choice, such as scan operator, info on batch cover sheet, document type, metadata extracted from documents, etc., you can dynamically assign different role groups to each batch and provide a specialized access to the batch instances.

As you can see from the examples of the user access control mentioned in this blog, Ephesoft is capable of providing multi-layers of user access control that are easy to apply and configure. Ephesoft not only makes it easy to process your documents, but also put you in full control of user access on your valuable information.

Adding Full Text Search to ACM via Spring and JPA

June 24th, 2014 by David Milller

What, No Full Text Search Already?

My project Armedia Case Management (ACM) is a Spring application that integrates with Alfresco (and other ECM platforms) via CMIS – the Content Management Interoperability Standard.  ACM stores metadata in a database, and content files in the ECM platform.  Our customers so far have not needed integrated full text search; plain old database queries have sufficed. Eventually we know full text search has to be addressed.  Why not now, since ACM has been getting some love?  Plus, high quality search engines such as SOLR are free, documented in excellent books, and could provide more analytic services than just plain old search.

Goals

What do we want from SOLR Search integration?

  1. We want both quick search and advanced search capabilities.  Quick search should be fast and search only metadata (case number, task assignee, …).  Quick search is to let users find an object quickly based on the object ID or the assignee.  Advanced search should still be fast, but includes content file search and more fields.  Advanced search is to let users explore all the business objects in the application.
  2. Search results should be integrated with data access control.  Only results the user is authorized to see should appear in the search results.  This means two users with different access rights could see different results, even when searching for the same terms.
  3. The object types to be indexed, and the specific fields to be indexed for each object type, should be configurable at run time.  Each ACM installation may trace different object types, and different customers may want to index different data.  So at runtime the administrator should be able to enable and disable different object types, and control which fields are indexed.
  4. Results from ACM metadata and results from the content files (stored in the ECM platform) should be combined in a seamless fashion.  We don’t want to extend the ECM full-text search engine to index the ACM metadata, and we don’t want the ACM metadata full text index to duplicate the ECM engine’s data (we don’t want to re-index all the content files already indexed by the ECM).  So we will have two indexes: the ACM metadata index, and the ECM content file index.  But the user should never be conscious of this; the ACM search user interface and search results should maintain the illusion of a single coherent full text search index.

Both Quick Search and Advanced Search

To enable both quick search and advanced search modes, I created two separate SOLR collections.  The quick search collection includes only the metadata fields to be searched via the Quick Search user interface.  The full collection includes all indexed metadata.  Clearly these two indexes are somewhat redundant since the full collection almost certainly includes everything indexed in the quick search collection.  As soon as we have a performance test environment I’ll try to measure whether maintaining the smaller quick search collection really makes sense.  If the quick search collection is not materially faster than the equivalent search on the full index, then we can stop maintaining the quick search collection.

Integration with Data Access Control

Data access control is a touchy issue since the full text search queries must still be fast, the pagination must continue to work, and the hit counts must still be accurate.  These goals are difficult to reach if application code applies data access control to the search results after they leave the search engine.  So I plan to encode the access control lists into the search engine itself, so the access control becomes just another part of the search query.  Search Technologies has a fine series of articles about this “early binding” architecture: http://www.searchtechnologies.com/search-engine-security.html.

Configurable at Runtime

ACM has a basic pattern for runtime-configurable options.  We encode the options into a Spring XML configuration file, which we load at runtime by monitoring a Spring load folder.  This allows us to support as many search configurations as we need: one Spring full-text-search config file for each business object type.  At some future time we will add an administrator control panel with a user interface for reading and writing such configuration files.  This Spring XML profile configures the business object to be indexed.  For business objects stored in ACM tables, this configuration includes the JPA entity name, the entity properties to be indexed, the corresponding SOLR field names, and how often the database is polled for new records.  For Activiti workflow objects, the configuration includes the Activiti object type (tasks or business processes), and the properties to be indexed.

Seamless Integration of Database, Activiti, and ECM Data Sources

The user should not realize the indexed data is from multiple repositories.

Integrating database and Activiti data sources is easy: we just feed data from both sources into the same SOLR collection.

The ECM already indexes its content files.  We don’t want to duplicate the ECM index, and we especially don’t want to dig beneath the vendor’s documented search interfaces.

So in our application code, we need to make two queries: one to the ACM SOLR index (which indexes the database and the Activiti data), and another query to the ECM index.  Then we need to merge the two result sets.  As we encounter challenges with this double query and result set merging I may write more blog articles!

Closing Thoughts

SOLR is very easy to work with.  I may use it for more than straight forward full text search.  For example, the navigation panels with the lists of cases, lists of tasks, lists of complaints, and so on include only data in the SOLR quick search collection.  So in theory we should be able to query SOLR to populate those lists – versus calling JPA queries.  Again, once we have a performance test environment I can tell whether SOLR queries or JPA queries are faster in general.

Stay up-to-date on all my blogs about my Armedia Case Management projects. 

Copyright © 2002–2011, Armedia. All Rights Reserved.