Машинное обучение в ритейле: как справиться с неполными данными
- The format is different from before. New internal systems, IT solutions, as well as the difference in data collection methods (whether it is by day or by transaction) can cause such a difference.
- The data was initially collected for other purposes. For example, for top management to pay bonuses to the Category Managers — such data is not eligible for the algorithms.
- The retailer has not been in the market long enough. As a result, the initial sales are nearly entirely reliant on the site traffic, making it difficult to analyze how prices impacted sales during that time frame.
- The retailer has sales data for various departments or brands for short time periods — algorithms cannot work properly due to that mixed sales data.
- the current data is used as the train set;
- the missing data is used as the forecast goals.