We are overwhelmed with data. The amount of data in the world, in our lives,seems to go on and on increasing, and there’s no end in sight. Omnipresent personal computers make it too easy to save things that previously we would have trashed. Inexpensive multi-gigabyte disks make it too easy to postpone decisions about what to do with all this stuff we simply buy another disk and keep it all.
The World Wide Web overwhelms us with information. Meanwhile, every choice we make is recorded.And all these are just personal choices: they have countless counterparts in the world of commerce and industry.We would all testify to the growing gap between the generation of data and our understanding of it.
As the volume of data increases,inexorably, the proportion of it that people understand decreases, alarmingly. Lying hidden in all this data is information, potentially useful information, that is rarely made explicit or taken advantage of.
People have been seeking patterns in data since human life began. Hunters seek patterns in animal migration behavior, farmers seek patterns in crop growth, politicians seek patterns in voter opinion, and lovers seek patterns in their partners’ responses. A scientist’s job is to make sense of data,to discover the patterns that govern how the physical world works and encapsulate them in theories that can be used for predicting what will happen in new situations.The entrepreneur’s job is to identify opportunities, that is, patterns in behavior that can be turned into a profitable business, and exploit them.
Economists, statisticians, forecasters, and communication engineers have long worked with the idea that patterns in data can be sought automatically, identified, validated, and used for prediction.
As the world grows in complexity, overwhelming us with the data it generates, data mining becomes our only hope for elucidating the patterns that underlie it. Intelligently analyzed data is a valuable resource. It can lead to new insights and, in commercial settings, to competitive advantages.
Data mining is about solving problems by analyzing data already present in Databases.
A database of customer choices, along with customer profiles, holds the key to this problem. Patterns of behavior of former customers can be analyzed to identify distinguishing characteristics of those likely to switch products and those likely to remain loyal. Once such characteristics are found, they can be put to work to identify present customers who are likely to jump ship. This group can be targeted for special treatment,treatment too costly to apply to the customer base as a whole. More positively, the same techniques can be used to identify customers who might be attracted to another service the enterprise provides, one they are not presently enjoying, to target them for special offers that promote this service.
In today’s highly competitive, customer-centered, service-oriented economy, data is the raw material that fuels business growth, if only it can be mined.
How are the patterns expressed ? Useful patterns allow us to make nontrivial predictions on new data. There are two extremes for the expression of a pattern:
as a black coffer whose innards are effectively incomprehensible and as a transparent coffer whose construction reveals the structure of the pattern.
Both, we are assuming, make good predictions.The difference is whether or not the patterns that are mined are represented in terms of a structure that can be examined, reasoned about, and used to inform future decisions.
Such patterns we call structural because they capture the decision structure in an explicit way. In other words, they help to explain something about the data.
Structural patterns
The rules do not really generalize from the data.They merely summarize it. In most learning situations, the set of examples given as input is far from complete, and part of the job is to generalize to other, new examples.
Real-life datasets invariably contain examples in which the values of some features, for some reason or other, are unknown.For example, measurements were not taken or were lost.
Machine learning
Earlier we defined data mining operationally as the process of discovering patterns, automatically or semi-automatically, in large quantities of data and the patterns must be useful. An operational definition can be formulated in the same way for learning.
Things learn when they change their behavior in a way that makes them perform better in the future.
This ties learning to performance rather than knowledge. You can test learning by observing the behavior and comparing it with past behavior. This is a much more objective kind of definition and appears to be far more satisfactory.
Data mining
Data mining is a practical topic and involves learning in a practical, not a theoretical, sense.
We are interested in techniques for finding and describing structural patterns in data as a tool for helping to explain that data and make predictions from it.
We are interested in techniques for finding and describing structural patterns in data as a tool for helping to explain that data and make predictions from it. The data will take the form of a set of examples.
Examples of customers who have switched loyalties, for instance, or situations in which certain kinds of contact lenses can be prescribed. The output takes the form of predictions about new examples.A prediction of whether a particular customer will switch or a prediction of what kind of lens will be prescribed under given circumstances.
People frequently use data mining to gain knowledge, not just predictions. Gaining knowledge from data certainly sounds like a good idea if you can do it.
As a conclusion,To know more about data mining,I have made a video defining data mining,you will get a useful information from it 🙂