Enterprise Content Management

Armedia Blog

Archive for the ‘Architecture’ Category

The CRASH Report

January 10th, 2012 by Scott Roth

Cast software, the maker of software quality tools, released their second annual CRASH (Cast Report on Application Software Health) report in December. The report examined the “health” of world-wide software applications by examining the source code of 745 applications (~365 million lines of code), from 160 different companies, spanning 10 industry sectors, and 8 programming languages. The code examination flagged 1800 different types of development and architecture violations that compromise application “health” in 5 major categories. The categories were:

  • Robustness – The stability of an application and the likelihood of introducing defects when modifying it.
  • Performance – The efficiency of the software’s application layers.
  • Security – An application’s ability to prevent unauthorized intrusions.
  • Transferability – The ease with which an application can be transferred to a new maintenance team.
  • Changeability – An application’s ability to be easily and quickly modified.

Cast has drawn some interesting conclusions in their report. Here are a few I found notable.

  • The most secure applications seem to be large COBOL applications in the financial and insurance industry (that should be reassuring to everyone). The least secure applications were written in .Net.  Yikes!
  • J2EE applications scored worst in performance, primarily because of misunderstood technologies and frameworks.  Another contributing reason could be the high degree of modularity inherent in J2EE applications.
  • Transferability scores for applications in the government sector scored lower than in any other industry sector. Being a government contractor, this one strikes close to home. What conclusions or insights can be gleaned from this finding? One insight the report draws is that government agencies are spending ~73% (on average) of their IT budgets to maintain existing applications — more than any other industry sector. I ask you, where is the money in government IT contracting?
  • Transferability and Changeability scores were highest for applications developed using a classic waterfall style methodology, as opposed to an Agile methodology. Whoa! I didn’t see that one coming. (Cast found that the other three categories, Robustness, Performance and Security, were about equal between waterfall and Agile methodologies.) Perhaps because Agile projects are in a continual state of refactoring that they are never in an ideal state to be transferred.

All of the deficiencies identified in this report are termed “technical debt” and have an average cost (according to Cast) of $3.61 per line of code to repair — except for Java, which rings in at $5.42 per line of code. That’s a lot of money and consumes an enormous amount of IT budgets.  For the roughly 365 million lines of code used in the study, that’s $1.3 billion of technical debt.

In conclusion, let me quote from Cast’s own conclusion, who I think said it quite well: “The observations from these data suggest that development organizations are focused most heavily on Performance and Security in certain critical applications. Less attention appears to be focused on removing the Transferability and Changeability problems that increase the cost of ownership and reduce responsiveness to business needs. These results suggest that application developers are still mostly in reaction mode to the business rather than being proactive in addressing the long term causes of IT costs and geriatric applications.”

The 22 page executive summary can be downloaded from Cast here.
http://www.castsoftware.com/resources/resource/email-campaigns/cast-report-on-application-software-health?gad=HPH

Beginner Thoughts on Alfresco Architecture

February 15th, 2011 by jhsu

If you asked me three months ago if I enjoy developing to Alfresco’s repository and Share UI (specifically Enterprise 3.3.3), it would have been a frustrated “NO…” — Not frustration about WANTING to learn a new product, since that is always exciting and challenging, but frustration on not understanding the architecture, the BIG PICTURE, to know how to use the product.

I’m not going into any deep personal thoughts on Alfresco ECM architecture today; rather I am going to share my thoughts on working with the Alfresco repository and the Alfresco Share UI from a software developer’s standpoint that had (almost) no prior experience with Alfresco, or ECM platforms in general.

I am relatively new to Armedia, and went headfirst into learning Alfresco ECM.  NOTE! It is generally a good idea to get a “crash-course” in a new product from a product expert before jumping headfirst into a new product, even if Alfresco has thorough documentation via their Wiki (http://wiki.alfresco.com/wiki/Main_Page).  My coworkers and I had the pleasure to have an “Alfresco Architecture Crash-Course” presentation shared with us from Dimy Jeannot who was working on another Armedia Alfresco project.

The high-level architecture is as seen below.  Any numerous UI technology, frameworks, etc are represented by the presentation layer (Alfresco Share, jQuery, EXTJS, etc).  The presentation layer can communicate with the Alfresco repository, or via Alfresco data web scripts (data layer).

High-Level Alfresco Architecture

High-Level Alfresco Architecture

The more detailed Alfresco architecture is seen below; we use the Alfresco Share UI as our example here.  Share, and any custom Share code you write, calls RESTful presentation and/or data web scripts.  These web scripts, in turn, communicate directly with Java Core Services exposed by Alfresco (i.e. document, search, node), or any custom services you may need to add for your application, to communicate with the Alfresco Repository.

Never assume that Alfresco and Share are running on the same server host.  For example, you should not hardcode localhost or any other hostname in your Share code.  Instead, use Share’s proxy URL when invoking a data web script (i.e. from a Share ftl, use Alfresco.constants.PROXY_URI when invoking your data web script -> Alfresco.constants.PROXY_URI + “<URL>”).

Where should you store your web scripts?  If you are concerned with data, store your web scripts in the data layer.  When your data web scripts are deployed, they are stored at webapps\<alfresco webapp context>\WEB-INF\classes\alfresco\templates\webscripts, create custom packages for your application.  Else you are concerned with presentation (such as from a Share dashlet), store your code in the presentation layer (it should then in turn call a data web script to get its data!).  When your presentation web scripts are deployed, they are stored at webapps\<share webapp context>\WEB-INF\classes\alfresco\site-webscripts, create custom packages for your application.

Detailed Alfresco Architecture

Detailed Alfresco Architecture

Looking back now and reading the Alfresco Wiki’s, the Alfresco architecture makes sense, I just needed guidance on putting the big picture together to start feeling comfortable with working with a new product.  If you ask me do I enjoy developing with Alfresco now, I would say yes <with smiles>.

Speaking IBM FileNet

November 29th, 2010 by jschivera

In my years of working in the content management space I’ve experienced the challenges of switching horses, sorry I mean platform vendors more than once. Terminology aside, it’s hard to quickly understand let alone master a new platform simply because how each platform has evolved over the years. The breadth and width of functionality in most ECM systems these days, let alone their complexity, is enough to take your breath away. Also, the basic DNA of each platform is fundamentally different. I’ve worked with Documentum for many years, going back to before the EMC acquisition.

I used to believe that to be conversant in the Documentum product line I had to install and configure all of their products and add-ons. My thinking: if I hadn’t installed, configured and operated the software, how could I properly architect a solution for a client. Anyone that has worked with the EMC Documentum product stack now knows there simply isn’t enough hours in the year to get that familiar with all of their products. (Just keeping up with the name changes is hard enough.) My ah-ha moment came when I discovered that understanding the underlying philosophy, or rhythm of how Documentum was designed and evolved would provide the insight into how the various products in the stack worked together. For Documentum that rhythm means understanding the full object model; appreciate how a product fits into that object model, understand its capabilities. Now though, I am in the process of learning IBM FileNet and believe that rhythm can be found in the FileNet Enterprise Reference Architecture (ERA) model.

Understanding how FileNet evolved into the ERA will help an architect develop conceptual solutions for clients. The IBM FileNet ERA is organized into three distinct layers (as described below). These layers represent the IBM FileNet product line from the highest functional level down through functional groups then down to the key capabilities within each functional grouping.

The first layer of the IBM FileNet ERA represents the highest level view of the IBM view of the FileNet capabilities. There are nine major functional areas comprising Layer One:

  1. Input, Presentation and Output Services
  2. ECM/BPM Capabilities
  3. ECM/BPM Service Bus
  4. Data Services
  5. Storage Services
  6. Management Services
  7. Security Services
  8. Integration Services
  9. Development Services

IBM FileNet ERA Layer One


(source: IBM)

It is interesting to note IBM’s visual orientation and spacing of the various capabilities. At the core are the ECM/BPM Capabilities and the ECM/BPM Service Bus. All the other areas are arranged to highlight the supporting nature of the capability or an input or output processing capabilities.

Each subsequent Layer of the ERA further expands the features of the 9 functional areas. Layer Two contains the functional groups with Layer Three the key capabilities and services. For example; the Layer One functional area ECM/BPM Capabilities contains ten functional grouping at Layer Two which includes area such as Records Management, Content Management or Digital Asset Management. At Layer Three the functional grouping Digital Asset Management contains capabilities and services such as Multimedia, Streaming or Transcoding.

So for me, the rhythm of the IBM FileNet product stack is found within the ERA. The ERA provides a road map to the interoperability and services of the FileNet components, and provides a common reference point for all.

So long, farewell, auf Wiedersehen, eRoom

June 11th, 2010 by cstephenson

A simple goal – “export, transform, load” – the destination is a matter of choice.

EMC eRoom is going away.  It has been marked as End of Life (EOL) so what next?  EMC Documentum have 2 options: EMC Documentum Collaboration Services; EMC Documentum Centerstage.  Armedia’s immediate goal is to support Collaboration Services, then Centerstage but why stop there?  Why limit a client’s choice.

Armedia’s eRoom migration story is in 3 acts (and yes, I have been listening to some test pieces that I used to play in my brass banding days – check out Year of the Dragon by Philip Sparke).

Act I – The Export

Getting the content out of eRoom into an understandable format.  Of course, its not just the content, there is  a large quantity of metadata in eRoom as well.  Act I – The Export deals solely with interrogating eRoom and generating a document detailing everything about eRoom.  From communities to Files.  From eRoom Setup to databases – we mean everything.  The result: a well-formed XML document

Act II – The Transformation

As with any classic performance, after the captivating opening, Act II deals with getting to know the characters.  In this case, the transformation gets to know the XML document and gains a deep understanding of the objects held within.  The transformation is responsible for also generating a secondary XML document. This is formed to support the ingestion to a new Content Management System (CMS) and / or Collaboration System.  Currently the supported transformation is for EMC Documentum Collaboration Services.  This can easily be extended due to the flexible architecture of this utility and is simply a case of transforming XML.

Act III – The Load

The closing act is the build up to the dramatic climax which leaves the audience going “WOW!”.  eRoom Migration aims to achieve the same “WOW!”.  Now that the XML has been transformed you can sit back and let the load run automatically.  That’s it.  By using the ingestion engine of Caliente! loading all the content and metadata is simple.  Just let eRoom Migration take care of everything for you.  The only thing it does not do is say “WOW!” – we leave that to you.

Over the next few weeks I plan to talk in more detail about the approach taken and dig deeper into the 3 different pieces of the migration effort.  For those eRoom users, what do you see yourselves using in the near future?

OSGi for business applications

January 29th, 2010 by dmiller

OSGi is a dynamic module system for Java.  An OSGi system is a network of components that communicate via defined interfaces.  Each component is deployable, manageable, and updatable, with predictable effects on the other deployed components.  Each component can be stopped, started, deployed, and undeployed without having to shut down the component host.  Sound familiar?  This is service-oriented architecture (SOA) in a single JVM… All the goodness of SOA, with none of the badness.

If you are a Java developer, or the administrator for a JEE system, most likely you are using OSGi already.  The Eclipse development environment is OSGi-based: Eclipse modules are OSGI components.  Almost every application server is based on OSGi: JBoss, Weblogic, WebSphere.

Using OSGi for a business application should affect your design.  Most systems allocate technical requirements to components.  For the most part, these components live only in the architecture and design.  In most all the projects I’ve ever worked on, the deliverable Java code lives in big honkin’ jar files with little or no traceability back to the architecture and design.  Inside the jar file, every class has access to every other class, and the internal dependencies end up spreading all over the place.  Reasoning about the code base, or pulling out smaller features for reuse elsewhere, becomes very difficult.

In a well-designed OSGi architecture, each logical component from the architecture and design should become one OSGi bundle (the OSGi jargon for component).  Clients have access only to the public interface of each bundle; the implementation remains inaccessible.  Instead of a handful of big honkin’ monolithic jar files, you end up with many small well-defined bundles.  This direct correspondence of one architecture/design artifact (a component) to one development/runtime artifact (a bundle) is what I like the most about OSGi.

Some benefits of a network of well-defined, small components:

  1. Easy to support different implementations of the same interface.  When development starts, you can quickly release stub bundles of the interfaces your team is responsible for; other teams consume your stub bundle to write their own bundles.  As you complete real implementations of each interface, your clients can replace their stubs with the real thing, without missing a beat.
  2. Clients can upgrade only the bundles they’re interested in.  With one big honkin’ jar file, every client has to upgrade every time you release, even if they don’t need the defect fixes or new features in this release.  With a network of small bundles, each client only gets what they really need.
  3. An end to classpath hell.  The OSGi runtime can host different versions of each bundle.
  4. Supposing you are a software vendor, you can package your applications as a set of bundles.  You can offer a basic cheaper system with a few bundles, and value-added features implemented as extra bundles.  Your clients that don’t want or need a NIEM interface, and your clients that want it very much, are both happy.

There are some drawbacks to OSGi for business applications.

  1. Currently using OSGi with plain ol’ Web applications can take some work.  Products like the SpringSource dm Server aim to make this easier.  Also, the OSGi Alliance is working on Enterprise specifications to define how OSGi can be used with JNDI, data sources, JPA, etc.
  2. Java developers have to become a little more aware of design issues.  Most Java programmers I work with want to get their features implemented and defects resolved with as little time and thought as possible.  This is the very genesis of the big honkin’ jar files I keep mentioning… It takes little time or thought to slam out one more class and add it to the one big honkin’ jar file.  This problem should get easier with better tool support.  For example, the Spring Tool Suite and IntelliJ IDEA have first-class support for OSGi development.

InfoQ has an excellent series of blogs to help you get started with OSGi development: Part 1, Part 2, Part 3, Part 4.  Peter Kriens’ blog is a great way to keep up with OSGi news and events.

Copyright © 2002–2011, Armedia. All Rights Reserved.