Saturday, September 24, 2011

The under-appreciated role of PLM in implementing Document Creation Retention and Destruction (DCRD) policies


Have you heard about the famous (or infamous?) “Pitt’s Postulate”: “Whenever you think you’ve destroyed the last copy of any document, there’s always one more that exists, and it will surface at exactly the most inopportune time. The only exception, of course, is if you really need the document, at which point you’ll discover that you actually did destroy the very last copy.” – Quite true, isn’t it? Over the last few years businesses have come to appreciate that they need to better manage their documents and records to tackle a number of key issues including legislation's like SOX, risk of lawsuits, protection of intellectual property and identity theft. Most corporations have formalized Document Creation, Retention and Disposal (DCRD) policies, primarily outlining the length of time for which the organization will keep certain documents and records, either in hard copy or electronic form. These policies serve as guiding principles for employees, indicating which documents to discard and which to save while ensuring compliance with laws and hindering inadvertent or ingenuous destruction of records.

Numerous off-the-shelf solutions are available for such purposes (including SharePoint which impelled me to write this article) but if a company has already implemented a commercial PLM system then they in all probability do not need to go for another system to take care of DCRD policies. Record/Document classification, integrity, storage, access control, data retention, backup, data purging, comprehensive version and revision control, viewable file generation, auditing, search etc. are all effectively present in a enterprise PLM system.
 
What are the challenges before trying to get PLM in managing corporate records?

→ Assessment of different document type and their value needs to be determined first – Certain type of documents might not be suitable for storage in PLM like employee evaluations or expense reports etc when other systems are present to manage them.
→ Emails in particular – Time and again emails have been the center of attention in much litigation and unfortunately the retention/control/purge of emails is out of PLM’s scope.
→ Backups – Electronic data backup produce historical copies of documents/records which even though purged from the current system will be preserved in archives and are discoverable.
 
My conclusion: PLM is a viable solution in conjunction with other systems to aid implementation of DCRD policies. If a PLM solution is present then a new Records Management solution doesn’t need to be implemented.

Saturday, September 10, 2011

Securing your PLM Infrastructure (before it’s too late!)

I read an interesting report at Symantec’s website - The average organizational cost of a data breach [in 2010] increased to $7.2 million and cost companies an average of $214 per compromised record, markedly higher when compared to $204 in 2009. The full report is here. Also per the Second Annual Cost of Cyber Crime Study



Cyber attacks have become common occurrences. The companies in our study experienced 72 successful attacks per week and more than one successful attack per company per week. This represents an increase of 44 percent from last year’s successful attack experience.
Pretty frightening huh! I am not sure it would be possible to break up such estimates into a granular level, but data loss from a PLM system would be also very expensive. And that needs to be seriously looked into. With this mission I looked at various PLM vendors websites – all they claimed was that their system was “highly secure” – But how secure is the question? Is there an established criterion or matrices or has it been quantitatively corroborated? CIMData  deals with this topic in their white paper  “TenQuestions to Ask PLM Solution Suppliers - What You Need to Know to Make anInformed Decision” though very briefly.

The risks are not only external but also present due to the fact that today users work from everywhere and not only from their office and even iPad apps are being released by several vendors. Other requirements like ITAR, export control, and other security protocols make it more imminent to secure your PLM environment. I believe during the PLM evaluation phase itself security mechanisms need to be evaluated and security should not be added as an afterthought.
Some of the security measures that come to mind (and nowhere are these comprehensive):
     Application Layer Security
o   Application Security – Including On-site and off-site tape storage, OS hardening, Virus protection on all servers, etc.
o   User Authentication - Access Control and Data classification model. Strike the right balance between sharing information and securing it. See Ford’s slides on Product Data Security and Access Management.
     Data Management Security
o   Data Encryption – 128-bit Secure Sockets Layer (SSL) data encryption, etc.
o   Database Security – minimal open ports, no scott/tiger or Default Password’s (I have seen this a lot!), no master passwords to control access to all systems, No text (property) file passwords!
o   Lock down on file vaulting servers(s)
     Systems Security
o   Internal and Operating Systems Security – Firewalls, network address translation, port redirection, IP masquerading, non-routable IP addressing schemes, DMZ, Intrusion detection systems, etc.
     Data Center Security
o   Physical Security of server farms including biometric authentication for access. (Though one of my customers had an enterprise down scenario when rats gnawed down some of their fiber optic cables)
o   Reliability and Backup –Hardware: UPS battery systems, diesel generators, and HVAC systems – (I saw this first hand when a few years back a snow storm hit a customer site in New England area and power lines were down, they didn’t have diesel generators and UPS battery lasted only for an hour or so causing servers to crash). Disaster recovery sites, Backup tapes also are important.
o   Water Suppression, Fire protection facility in server room.
o   Social Engineering – Do not underestimate the human aspect of security. Ignorant or discontented employees can cause more harm than you can imagine. Former computer hacker Kevin D. Mitnick has a good book on this topic: “The Human Element of Security
Audits and Standards
Facilities can be designed to withstand extreme elements that comply with ISO/IEC 27001 standards. A SAS 70 Type II or SSAE 16 or ISAE 3402  audits can also be done. A comprehensive security assessment and mitigation of risks related to system would bring in peace of mind.
Conclusion
There is a Data Breach Risk Calculator available which will estimate your risk exposure and can calculate amongst others the cost of a data breach at your company.
Organizations should remember the costs involved – What a mid tier firm might have the funds for will be appreciably different from what a Fortune 100 firm would. They should pigeonhole the sets of security features needed based on priority and significance and make practical decisions based on their budget.

Tuesday, September 6, 2011

Using Rule-Based or Supervised Document Classification for Legacy Data Migration in a PLM Implementation

I was reading about Latent Semantic Analysis (LSA, also Latent semantic indexing ) recently and the controversies over whether Google used this technique to rank their search results, though the jury seems to be that they use something much more sophisticated statistical methods of text analysis. Latent semantic analysis (LSA) is a technique in natural language processing, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur close together in text.

Latent semantic indexing is closely related to LSA and is used in a assortment of information retrieval and text processing applications, although its primary use is for automated document categorization. Document classification/categorization are used to assign an electronic document to one or more categories, based on its contents. Document classification tasks can be divided into two sorts: supervised document classification where some external mechanism (such as human feedback) provides information on the correct classification for documents, and unsupervised document classification (also known as document clustering), where the classification must be done entirely without reference to external information. There is also a semi-supervised document classification, where parts of the documents are labeled by the external mechanism (Rule Based). There are Open Source tools [like Mallet] for statistical natural language processing, document classification, clustering, topic modeling, information extraction, etc.

The reason I bring up this topic in my experience in legacy data import during PLM implementations. Legacy data migration is tough to say the least. Stephen Porter gives a good overview here: The PLM State: What’s thebig deal about data migration?. Some of my real life experiences include:  

1.       Manual scanning of historical documents, manual classification of those documents into folders and uploading it to the PLM environment using vendor tools in a FDA regulated organization’s implementation.

2.       Legacy data extraction from a commercial document management system, mapping data with vendor PLM system, cleaning the legacy data, and finally data import.

3.       Legacy system consolidation - Merging numerous home grown legacy systems into one commercial PLM system.

None of the processes used were scalable or easy to start with. Also the amount of time taken cannot be guaranteed. In such scenarios wouldn’t using Rule-Based or Supervised Document Classification make sense? Arguably CAD data would be difficult to handle and historical revisions or intermediate iterations of files between releases might be lost but probably for non-CAD data using such techniques would make up for huge investments in time and labor for legacy data migrations.

Friday, September 2, 2011

Creative problem solving using Visual Analytics in Product Development


Visual analytics is a new interdisciplinary science aimed at drawing inference and conclusions from data. In contrast to standard machine learning or statistics, visual analytics emphasizes information visualization, interactivity, and analytic reasoning. [http://smlv.cc.gatech.edu/2010/03/17/what-is-visual-analytics/]. It is an outgrowth of the field’s information visualization and scientific visualization, which focuses on analytical reasoning facilitated by interactive visual interfaces. [http://en.wikipedia.org/wiki/Visual_analytics]



If you really want to go into the depths of VA then you can attend a few web lectures from Georgia Tech here

A few years back digg.com had these great visualizations – stack, swarm and big spy; though they seem to be out of service now. (see Where Have Digg Labs Gone?)

 

A related disciple is text analytics. The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation. [http://en.wikipedia.org/wiki/Text_analytics]

A tag or word cloud is related to text analytics. We all have seen these tag clouds over the past couple of years. Tag clouds are an informative image that communicates much in a single glance. Word clouds are easy to read, analyze and compare, serve a variety of useful purposes including visual analysis of qualitative data. For example using FDA Medical Devices CFR - Code of Federal Regulations Title 21 and TagCrowd I created this visualization (clearly showing that the major emphasis of this regulation is on manufacturers.)

Coming to the PLM domain, if we can process product data in such a method then we can arrive at a number of interesting observation very easily. For example to show which product has how many change requests? Or which product uses the least parts from a standard library. It can be easily done and if there is an enormous amount of data in an organization such visualizations can offer great information to executives very intuitively.


Another great tool I recently saw was from TouchGraph. See this image which displays a person’s network connections from Facebook.


This sort of visualization when brought into an enterprise can easily put in the picture about a multitude of different things, like for example; a change in a standard part is going to affect how many products down the line, etc. I believe Visual analytics will lead to creative problem solving and faster solutions to problems will drive higher product profitability.