Data mining consists of evolving set of techniques that can be used to extract valuable information and knowledge from massive volumes of data. Data mining research &tools have focussed on commercial sector applications. Only a fewer data mining research have focussed on scientific data. This paper aims at further data mining study on scientific data. This paper highlights the data mining techniques applied to mine for surface changes over time (eg Earthquake rupture). The data mining techniques help researchers to predict the changes in the intensity of volcano.
This paper uses predictive statistical models that can be applied to areas such as seismic activity or the spreading of fire. The basic problem in this class of systems is dynamic, usually unobservable with respect to earthquake. The space-time patterns associated with time, location and magnitude of the sudden events from the force threshold are observable. This paper highlights observable space time earthquake patterns from unobservable dynamics using data mining techniques, pattern recognition and ensemble forecasting. Thus this paper gives insight on how data mining can be applied in finding the consequences of earthquake and warning the scientific, hence alerting the public.
• Data mining is defined as an information extraction activity whose goal is to discover hidden facts contained in databases.
• It refers to finding out new knowledge about an application domain using data on the domain usually stored in a database. The application domain may be astrophysics, earth science solar system science.
• It’s a variety of techniques to identify nuggets of information or decision making knowledge in bodies of data and extracting these in such a way they can be put to use in the areas such as decision support, prediction ,forecasting and estimation.
DATA MINING GOALS:
• Bring together representatives of the data mining community and the domain science community so that they can begin to understand the currents capabilities and research objectives of each others communities related to data mining.
• Identify a set of research objectives from the domain science community that would be facilitated by current or anticipated data mining techniques.
• Identify a set of research objectives for the data mining community that could support the research objectives of the domain science community.
DATA MINING MODELS:
Data mining is used to find patterns and relationships in data patterns and relationships in data patterns can be analyzed via 2 types of models.
1. Descriptive models: Used to describe patterns and to create meaningful subgroups or clusters.
2. Predictive models .Used to forecast explicit values, based upon patterns in known results. **This paper focuses on predictive models.
In large databases data mining and knowledge discovery comes in two flavors:
1.Event based mining:
• Known events/known algorithms: Use existing physical models (descriptive models and algorithms) to locate known phenomena of interest either spatially or temporally within a large database.
• Known events/unknown algorithms: Use pattern recognition and clustering properties of data to discover new observational (physical) relationships (algorithms) among known phenomena.
• Unknown events/known algorithms: Use expected physical relationships (predictive models, Algorithms) among observational parameters of physical phenomena to predict the presence of previously unseen events within a large complex database.
• Unknown events/unknown algorithms: Use thresholds or trends to identify transient or otherwise unique events and therefore to discover new physical phenomena.
** This paper focuses on unknown events and known algorithms.
2. Relationship based mining:
• Spatial Associations: Identify events (eg astronomical objects) at the same location. (eg same region of the sky)
• Temporal Associations: Identify events occurring during the same or related periods of time.
• Coincidence Associations: Use clustering techniques to identify events that are co-located within a multi-dimensional parameter space.
** This paper focuses on All relationship-based mining.
User requirements for data mining in large scientific databases
• Cross identifications: Refers to the classical problem of associating the source list in one database to the source list in another.
• Cross correlation: Refers to the search for correlations, tendencies, and trends between physical parameters in multidimensional data usually across databases.
• Nearest neighbor identification. Refers to the general application of clustering algorithms in multidimensional parameter space usually within a database.