A Fuzzy clustering Algorithm for Drug Design using Biological Properties

From IIITM-k-wiki

Jump to: navigation, search

A Fuzzy clustering Algorithm for Drug Design using Biological Properties is an internship project for MS-IT students of IIITM-K


The members are

1.Abhilash A

2.Ajay S Ani

3.Prabu J

4.Praveen Kumar V

5. Sanil Shanker K P ( Association)

6. Dr. Elizabeth Sherly ( Supervisor)

7. Dr. Manoj K ( Consultant)

8. Dr. Gopinath M S ( Advisor)

Meeting We will be having a meeting of all the members of the project "A Fuzzy clustering Algorithm for Drug Design using Biological Properties" tomorrow ,12-04-07.

Abstract (version 0.2)


Objective

To develop a system to create, manage and analyze chemical structures and their data in order to provide a convenient and straight-forward approach for the analysis of chemical and biological data. The system should be a database-centric environment that supports query and sorting functionality, and handles large volumes of data in order to trace a molecule having a particular structure based on the partial charge. The basic idea is to utilize large existing pharmaceutical databases as input for a new type of structure/activity correlation methodology in order to calculate a large set of new and traditional descriptors to create improved Quantitative Structure-Activity Relationship (QSAR) models that characterize and predict important biological responses.

Introduction

All marketed drugs today target only about 500 gene products. The elucidation of the human genome, which has an estimated 30,000 to 40,000 genes, presents immense new opportunities for drug discovery and simultaneously creates a potential bottleneck regarding the choice of targets to support the drug discovery pipeline. The major advances in genomics and sequencing means that finding an attractive target is no longer a problem but finding the targets that are most likely to succeed has become the challenge.

The focus of bioinformatics in the drug discovery process has therefore shifted from target identification to target validation. A lot of factors need to be taken into account concerning a candidate target from a multitude of heterogeneous resources. The types of information that one needs to gather about potential targets include many chemical and biological parameters such as nucleotide and protein sequencing information, homologues, mapping information, function prediction, pathway information, disease associations, variants, structural information, gene and protein expression data and species/taxonomic distribution among others. Different bioinformatics tools can be used to gather this information. The accumulation of this information into databases about potential targets means that the pharmaceutical companies can save themselves much time, effort and expense exerting bench efforts on targets that will ultimately fail.

In order to determine the role the potential drug target plays in a particular disease mechanism we use DNA and protein chips. These chips can measure the amount of transcript or protein expressed by a cell at different times or in different states (healthy versus diseased). This means that the potential drug will be effective against many bacteria killing them while causing no harm to the human. Clustering algorithms are used to organise the expression data into different biologically relevant clusters. We can then compare the expression profiles from the diseased and healthy cells to help us understand the role our gene or protein plays in a disease process. All of these computational tools can help to compose a detailed picture about a protein family, its involvement in a disease process and its potential as a possible drug target.

Need for the project

In ligand-based drug design, the identification of the bioactive conformation of a promising drug candidate molecule is of particular interest. The bioactive conformation is the ligand geometry that is favored to bind at the receptor. In the absence of structural information about the receptor, the prediction of the bioactive conformation of the ligand can be very challenging.

Conformational searching techniques are used to explore the conformational space of a ligand and locate minima on the ligand’s potential energy surface. Even in receptor-based drug design, where receptor structure is known, techniques such as virtual screening, docking, and denovo design would consider various conformations of a ligand. The analysis of the set of conformations of a ligand can provide insights into its physical, chemical, biological, or pharmacological properties. The primary objective of conformational analysis is to identify a set of putative bioactive conformations. Often, however, the number of conformations generated by a search method is very large and prohibits consideration of every conformation in the set as a receptor-binding geometry.

Hence, it becomes essential to use data reduction techniques such as clustering in order to first identify well-defined groups of conformations and then to select representative geometries from each group that can be used as putative bioactive conformations of the ligand. Conformational analysis, thus, has been an important technique for exploring molecular structure and relating it to molecular properties. Finding clusters is also useful for identifying common features among conformations and reducing the total number of conformations considered.

Proposed System

Designing molecules with appropriate chemical and biological properties is an essential key in drug design. With the currently available technologies, chemical structures of small molecules can be easily created using computer programs. A more challenging task, therefore the analysis of chemical / biological properties from these created structures. Accurate prediction of chemical and biological properties using computer technology has many useful applications in the field of biotechnology, particularly in synthetic chemistry and drug analysis.

The basic idea is to utilize large existing pharmaceutical databases as input for a new type of structure/activity correlation methodology in order to calculate a large set of new and traditional descriptors to create improved Quantitative Structure-Activity Relationship (QSAR) models that characterize and predict important biological responses. This computational technique should be used to detect the functional group in the compound in order to refine the drug. This can be done using the QSAR that consists of computing every possible number that can describe a molecule then doing an enormous curve fit to find out which aspects of the molecule correlate well with the drug activity or side effect severity. This information can then be used suggest new chemical modifications for synthesis and testing.

Once the descriptors have been determined and a predictive model has been built, thousands of new potential molecules, chemically similar to those of the benchmark data set, are scanned from large databases and are evaluated for their chemical properties based on the predictive model. The aim is to target a few novel molecules with potentially attractive pharmaceutical properties that can then be tested further in the traditional way in the laboratory. Computationally intelligent data mining techniques are vital to extract the information necessary to select these novel molecules.

An algorithm is to be designed in order to predict the desired biological responses and generate QSAR models using both known (labeled) and unknown (unlabeled) biological responses. The ultimate pay-off of this methodology is expected to lead to the rapid invention of new drugs for new or known society threatening diseases where a very fast response is warranted.

Why Fuzzy clustering?

Cluster analysis divides data into groups (clusters) such that similar data objects belong to the same cluster and dissimilar data objects to different clusters. The resulting data partition improves data understanding and reveals its internal structure. Partitional clustering algorithms divide up a data set into clusters or classes, where similar data objects are assigned to the same cluster whereas dissimilar data objects should belong to different clusters.

In real applications there is very often no sharp boundary between clusters so that fuzzy clustering is often better suited for the data. Membership degrees between zero and one are used in fuzzy clustering instead of crisp assignments of the data to clusters. The most prominent fuzzy clustering algorithm is the fuzzy c-means, a fuzzification of k-Means. Simulated annealing with conformational energy as the clustering criteria and more recently multidimensional and metric scaling and fuzzy clustering have been used to cluster families of conformations, with promising results.

Fuzzy clustering is a partition based clustering scheme and is particularly useful when there are no apparent clear groupings in the data set. Partitioning schemes provide automatic detection of cluster boundaries and in case of fuzzy clustering, these cluster boundaries overlap. Every individual data entity (a conformer, in this case) belongs to not one but all of the clusters with varying degrees of membership. However, there are very few instances where a partitioning scheme has been used to analyze families of molecular conformation.

Areas of application of fuzzy cluster analysis include for example data analysis, pattern recognition, and image segmentation. The detection of special geometrical shapes like circles and ellipses can be achieved by so-called shell clustering algorithms. Fuzzy clustering belongs to the group of soft computing techniques (which include neural nets, fuzzy systems, and genetic algorithms).


Features

The processes of designing a new drug using bioinformatics tools have opened a new area of research. By implementing Rational drug design techniques one can save much time and money for designing drugs. These techniques attempt to reproduce the researchers understanding of how to choose likely compounds built in to a software package that is capable of modeling a very large number of compounds in an automated way. Many different algorithms have been used for this type of testing, many of which were adapted from artificial intelligence applications. Our system should be able to · Group similar structures together and relate to activity Need to be able to organize thousands of active compounds into meaningful groups

· Computational representation of 2D structure Need to be able to store chemical structure and biological data for millions of data points

· Apply statistical methods to the structures and related information Need to learn as much information as possible (data mining)

· Need to use molecular modeling to gain direct chemical insight into reactions.

Conclusion

The basic idea is to utilize large existing pharmaceutical databases as input for a new type of structure/activity correlation methodology in order to calculate a large set of new and traditional descriptors to create improved QSAR models that characterize and predict important biological responses. This project involves the development of an infrastructure of computationally intelligent computer codes that allow for the virtual design of novel pharmaceuticals or the improvement of existing pharmaceuticals. The proposed methodology is applicable to most pharmaceuticals for which a database of responses is available. The ultimate pay-off of this methodology is expected to lead to the rapid invention of new drugs for new or known society threatening diseases where a very fast response is warranted.



Abstract ( ver 0.1)


Status of the Project : Project Requirement is ready. Detail SRS is under preparation.


Fuzzy Clustering Based on Partial Charges of Molecules for Drug Analysis



Objective

To develop a system to create, manage and analyze chemical structures and their data in order to provide a convenient and straight-forward approach for the analysis of chemical and biological data.


Introduction

All marketed drugs today target only about 500 gene products. The elucidation of the human genome, which has an estimated 30,000 to 40,000 genes, presents immense new opportunities for drug discovery and simultaneously creates a potential bottleneck regarding the choice of targets to support the drug discovery pipeline. The major advances in genomics and sequencing means that finding an attractive target is no longer a problem but finding the targets that are most likely to succeed has become the challenge. The focus of bioinformatics in the drug discovery process has therefore shifted from target identification to target validation. A lot of factors need to be taken into account concerning a candidate target from a multitude of heterogeneous resources. The types of information that one needs to gather about potential targets include many chemical and biological parameters such as nucleotide and protein sequencing information, homologues, mapping information, function prediction, pathway information, disease associations, variants, structural information, gene and protein expression data and species/taxonomic distribution among others. Different bioinformatics tools can be used to gather this information. The accumulation of this information into databases about potential targets means that the pharmaceutical companies can save themselves much time, effort and expense exerting bench efforts on targets that will ultimately fail. The information that is gathered helps to characterise the different targets into families and subfamilies. It also classifies the behaviour of the different molecules in a biochemical and cellular context. Decisions about which families provide the best potential targets is guided by a number of criteria. It is important that the potential target has a suitable structure for interacting with drug molecules. Structural genomics helps to prioritise the families in terms of their 3D structures.



Proposed System

Designing molecules with appropriate chemical properties is an essential key in drug design. With the currently available technologies, chemical structures of small molecules can be easily created using computer programs. A more challenging task, therefore the analysis of chemical properties from these created structures. Accurate prediction of chemical properties using computer technology has many useful applications in the field of biotechnology, particularly in synthetic chemistry and drug analysis. Molecular docking, which can broadly be defined as the prediction of the orientation of two molecules with respect to one another, is a computational technique that has been successfully used in the field of drug design.


Our system should be able to

· Computational representation of 2D structure


Need to be able to store chemical structure and biological data for millions of data points

· Group similar structures together and relate to activity


Need to be able to organize thousands of active compounds into meaningful groups

· Apply statistical methods to the structures and related information


Need to learn as much information as possible (data mining)

· Need to use molecular modeling to gain direct chemical insight into reactions.


Clustering algorithms are used to organise the expression data into different biologically relevant clusters. We can then compare the expression profiles from the diseased and healthy cells to help us understand the role our gene or protein plays in a disease process. All of these computational tools can help to compose a detailed picture about a protein family, its involvement in a disease process and its potential as a possible drug target.


To-Do list

1. Identifying or design a database that consists of molecules and its corresponding partial charges. If not locate datasets of functional groups

2. Identify the format in which the data is stored in the database.

3. The data should be displayed as a 2D structure to the user.

4. Identifying software /tools that will help the user to query an input structure along with partial charge and display the resulting data in a 2D structure.

To Access svn repository

  svn://202.88.239.61/courses/2006/itm112-webtech-project/agrimarketinfo/Drugdesign

To know about the articles and online magazines which we are refering to go throgh the links


Some of the papers which are refering are

Personal tools
<
May 2012
>
SMTWTFS
12345
6789101112
13141516171819
20212223242526
2728293031
Events Upcoming
More »