Study of Biological Data using Naïve Bayesian Classification

By Technology Half year ago

“Data mining refers to extracting or “mining” knowledge from large amounts of data”. Data mining should have been more appropriately named knowledge mining from data. There are many other terms carrying a similar or slightly different meaning to data mining, such as knowledge mining from databases, knowledge extraction, data/pattern analysis, data archaeology, and data dredging .

Because of their predictive power, data mining techniques have been widely used in diagnostic and health care applications. Data mining algorithms can learn from past examples in clinical data and model the oftentimes non-linear relationships between the independent and dependent variables. The resulting model represents formalized knowledge, which can often provide a good diagnostic opinion.

Classification is the most widely used technique in medical data mining. Classification mining has been widely used to analyze biomedical Databases .

“Current research in the area of Classification mining is tackles Study of Biological Databases, Analysis & Validation graph representation, Using Naïve Bayesian classification Technique for Hypothesis & modeling of hidden patterns ”.

Steps for doing classification mining are

Preprocessing & Transformation of biological data : For mining large Biological Databases , it is necessary to pre-process the Biological data and store the information in a data bases, which is more appropriate for further processing than a plain text file.

Development of Mathematical Data Modal : After performing preprocessing apply Naïve Bayesian classification. Bayesian classifier have also exhibited high accuracy and speed when applied to large databases .this technique is used for large categorical & numerical result .we want to analysis this method with other classification algorithm & find out better results for biological data .
Implementation & Validation: We can implement user interface for this idea using ORACLE 10g suite or ASP dot net 2005 or we can use any database for preprocessing the data We validated the results using K fold cross validation for finding optimal results .

Reasons for Basian classification mining for biomedical informatics field

The field of biomedical informatics has drawn increasing popularity and attention, and has been growing rapidly over the past two decades. Due to the advances in new molecular, genomic, and biomedical techniques and applications such as genome sequencing, protein identification, medical imaging, and patient medical records, tremendous amounts of biomedical research data are generated every day. Originating from individual research efforts and clinical practices, these biomedical data are available in hundreds of public and private databases, which have been made possible by new database technologies and the Internet.

Biomedical researchers and practitioners are now facing the “info-glut” problem. Currently, the rate of data accumulation is much faster than the rate of data interpretation. These data need to be effectively organized and analyzed in order to be useful.
New computational techniques and information technologies are needed to manage these large repositories of biomedical data and to discover useful patterns and knowledge from them. In particular, knowledge management, data mining, and text mining techniques have been adopted in various successful biomedical applications in recent years.
Data preprocessing is another major problem .We need to develop efficient tool for handling this problem .

Source: ...

MS SQL Server Oracle MS Access FoxPro MySQL DB2 PostgreSQL technology Microsoft Access Excel ffice Powerpoint Word Oracle