Descriptive analysis

Before trying any machine learning solution, it's necessary to create an abstract description of the context. The best way to achieve this goal is to define a mathematical model, which has the advantage of being immediately comprehensible by anybody (assuming the basic knowledge). However, the goal of descriptive analysis is to find out an accurate description of the phenomena that are observed and validate all the hypothesis. Let's suppose that our task is to optimize the supply chain of a large store. We start collecting data about purchases and sales and, after a discussion with a manager, we define the generic hypotheses that the sales volume increases during the day before the weekend. This means that our model should be based on a periodicity. A descriptive analysis has the task to validate it, but also to discover all those other particular features that were initially neglected.

At the end of this stage, we should know, for example, if the time series (let's suppose we consider only a variable) is periodic, if it has a trend, if it's possible to find out a set of standard rules, and so forth. A further step (that I prefer to consider as whole with this one) is to define a diagnostic model that must be able to connect all the effects with precise causes. This process seems to go in the opposite direction, but its goal is very close to the descriptive analysis one. In fact, whenever we describe a phenomenon, we are naturally driven to finding a rational reason that justifies each specific step. Let's suppose that, after having observed the periodicity in our time series, we find a sequence that doesn't obey this rule. The goal of diagnostic analysis is to give a suitable answer (that is, the store is open on Sunday). This new piece of information enriches our knowledge and specializes it: now, we can state that the series is periodic only when there is a day off, and therefore (clearly, this is a trivial example) we don't expect an increase in the sales before a working day. As many machine learning models have specific prerequisites, a descriptive analysis allows us to immediately understand whether a model will perform poorly or if it's the best choice considering all the known factors. In all of the examples we will look at, we are going to perform a brief descriptive analysis by defining the features of each dataset and what we can observe. As the goal of this book is to focus on adaptive systems, we don't have space for a complete description, but I always invite the reader to imagine new possible scenarios, performing a virtual analysis before defining the models.