Handling Unstructured Data in Biological Research and Clinical Trials

Share this postShare on Google+Share on LinkedInShare on FacebookTweet about this on TwitterEmail this to someone

Venkatarajan S. Mathura

Complex information that are available in a crude format make computational modeling and processing
a difficult objective in Biological and Health care research. Protocol revisions, process flexibility,
user adoptability should be considered in mind to develop an user friendly information management solution.
There is a great need to integrate data from several fields of research in a common platform to increase
process efficieny and reduce human errors. Realizing cost savings, application of knowledge mining tools on existing data and the need for data organization due to regulatory compliance, research groups and drug-discovery related industries are adopting electronic data capture and management solutions (EDC or EDM). Small scale setups still require an enterprise wide system that is both efficient, light weight, cost effective information management solutions that will be flexible to accomodate growth in the future. With the availability of field specific ontologies, controlled keywords/vocabularies, meta data management and language mapping tools, organization of unstructured data is becoming a feasible task. At the Roskamp Institute the Bioinformatics Group has developed several information management software to aid: Genomics, Proteomics, Animal Colony (Vivarium) and Clinical data management.

Clinical trials involve multi-site heterogeneous data generation with complex data input-formats and forms. The data should be captured and queried in an integrated fashion to facilitate further analysis. Electronic case-report forms (eCRF) are gaining popularity since it allows capture of clinical information in a rapid manner. We have designed and developed an XML based flexible clinical trials data management framework in .NET environment that can be used for efficient design and deployment of eCRFs to efficiently collate data and analyze information from multi-site clinical trials. The main components of our system include an XML form designer, a Patient registration eForm, reusable eForms, multiple-visit data capture and consolidated reports. A unique id is used for tracking the trial, site of occurrence, the patient and the year of recruitment.
Availability: http://www.rfdn.org/bioinfo/CTMS/ctms.html.

PWIMS 1.0: Proteomics Workflow and Information Management System

PWIMS is a software package that can systematically manage data in a proteomic laboratory setup. It is implemented in the LAMPP (Linux-Apache-MySQL-Perl/PHP) environment as a three-tier architecture. The client-tier is a web-browser that uses a thin-client HTTP to request resources and display responses to the user. The middle tier consists of an Apache web server, PHP scripting language, the Zend (PHP script) engine. The database tier uses MySQL RDBMS. Data Models and Entity-Relationships have been defined for handling data at various levels.

Some of the features of PWIMS are:

User Management: Multiple user settings, password authentication and access restriction depending on user role

Project Management: Project tracking, online availability of experiment status, experiment results, coordinators information, timeline and date records of the project.

Scheduling & Workflow Control: Tracks the workflow using unique codes for gels and target plates, controls workflow step-skipping and erroneous data entry, systematically schedules next step in the process, lists pending jobs

Data Capture: Form-based data entry, automatic mapping and transfer of large project files using FTP protocols, data can also be entered in a simple EXCEL sheet and uploaded

Data Integration & Analysis: Mass spec results are integrated to gels and projects. Protein hits can be filtered and exported for future reference or any other software, e.g., PDQuest. External links to UniProt, PUBMED, etc. are automatically provided

Data Mining & Presentation: Sequence motif search, functional keyword search and advanced queries can be specified. Results are made available for presentation and sharing.

GEMAT Genomics Experiment Management and Analysis Tool . GEMAT is an information management system designed as client-server tool for handling Affymetrix GeneChip information. It has builtin analytical tools for performing data mining and posting microarry data to endusers.