Have a personal or library account? Click to login
A Novel Drift Detection Algorithm Based on Features’ Importance Analysis in a Data Streams Environment Cover

A Novel Drift Detection Algorithm Based on Features’ Importance Analysis in a Data Streams Environment

Open Access
|Jun 2020

References

  1. [1] P. Duda, M. Jaworski, L. Pietruczuk, and L. Rutkowski, A novel application of Hoeffding’s inequality to decision trees construction for data streams, in Neural Networks (IJCNN), 2014 International Joint Conference on. IEEE, 2014, pp. 3324–3330.10.1109/IJCNN.2014.6889806
  2. [2] L. Rutkowski, L. Pietruczuk, P. Duda, and M. Jaworski, Decision trees for mining data streams based on the McDiarmid’s bound, IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 6, pp. 1272–1279, 2013.
  3. [3] L. Rutkowski, M. Jaworski, L. Pietruczuk, and P. Duda, Decision trees for mining data streams based on the Gaussian approximation, IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 108–119, 2014.10.1109/TKDE.2013.34
  4. [4] L. Rutkowski, M. Jaworski, L. Pietruczuk, and P. Duda, The CART decision tree for mining data streams, Information Sciences, vol. 266, pp. 1–15, 2014.10.1016/j.ins.2013.12.060
  5. [5] L. Pietruczuk, L. Rutkowski, M. Jaworski, and P. Duda, The parzen kernel approach to learning in non-stationary environment, in Neural Networks (IJCNN), 2014 International Joint Conference on. IEEE, 2014, pp. 3319–3323.10.1109/IJCNN.2014.6889805
  6. [6] L. Rutkowski, M. Jaworski, L. Pietruczuk, and P. Duda, A new method for data stream mining based on the misclassification error, IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 5, pp. 1048–1059, 2015.
  7. [7] P. Duda, M. Jaworski, and L. Rutkowski, Knowledge discovery in data streams with the orthogonal series-based generalized regression neural networks, Information Sciences,, 2017.10.1016/j.ins.2017.07.013
  8. [8] M. Jaworski, P. Duda, and L. Rutkowski, New splitting criteria for decision trees in stationary data streams, IEEE Transactions on Neural Networks and Learning Systems, vol. PP, no. 99, pp. 1–14, 2017.
  9. [9] M. Jaworski, P. Duda, L. Rutkowski, P. Najgebauer, and M. Pawlak, Heuristic regression function estimation methods for data streams with concept drift, in Lecture Notes in Computer Science. Springer, 2017, pp. 726–737.10.1007/978-3-319-59060-8_65
  10. [10] M. Jaworski, P. Duda, and L. Rutkowski, On applying the restricted boltzmann machine to active concept drift detection, in Computational Intelligence (SSCI), 2017 IEEE Symposium Series on. IEEE, 2017, pp. 1–8.10.1109/SSCI.2017.8285409
  11. [11] M. Jaworski, Regression function and noise variance tracking methods for data streams with concept drift, International Journal of Applied Mathematics and Computer Science, vol. 28, no. 3, pp. 559–567, 2018.10.2478/amcs-2018-0043
  12. [12] P. Duda, M. Jaworski, and L. Rutkowski, Convergent time-varying regression models for data streams: Tracking concept drift by the recursive parzen-based generalized regression neural networks, International Journal of Neural Systems, vol. 28, no. 02, p. 1750048, 2018.
  13. [13] P. Duda, M. Jaworski, A. Cader, and L. Wang, On training deep neural networks using a streaming approach, Journal of Artificial Intelligence and Soft Computing Research, vol. 10, no. 1, 2020.10.2478/jaiscr-2020-0002
  14. [14] A. Lall, V. Sekar, M. Ogihara, J. Xu, and H. Zhang, Data streaming algorithms for estimating entropy of network traffic, in ACM SIGMETRICS Performance Evaluation Review, vol. 34, no. 1. ACM, 2006, pp. 145–156.10.1145/1140103.1140295
  15. [15] C. Phua, V. Lee, K. Smith, and R. Gayler, A comprehensive survey of data mining-based fraud detection research, arXiv preprint arXiv:1009.6119, 2010.
  16. [16] A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi, Credit card fraud detection: A realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems, vol. 29, no. 8, p. 3784–3797, August 2018.
  17. [17] S. Disabato and M. Roveri, Learning convolutional neural networks in presence of concept drift, in 2019 International Joint Conference on Neural Networks (IJCNN), 2019, pp. 1–8.10.1109/IJCNN.2019.8851731
  18. [18] W. N. Street and Y. Kim, A streaming ensemble algorithm (sea) for large-scale classification, in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2001, pp. 377–382.10.1145/502512.502568
  19. [19] N. C. Oza, Online bagging and boosting, in Systems, man and cybernetics, 2005 IEEE international conference on, vol. 3. IEEE, 2005, pp. 2340–2345.
  20. [20] P. Duda, On ensemble components selection in data streams scenario with gradual concept-drift, in International Conference on Artificial Intelligence and Soft Computing. Springer, 2018, pp. 311–320.10.1007/978-3-319-91262-2_28
  21. [21] P. Duda, M. Jaworski, and L. Rutkowski, On ensemble components selection in data streams scenario with reoccurring concept-drift, in 2017 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2017, pp. 1–7.10.1109/SSCI.2017.8285362
  22. [22] L. Pietruczuk, L. Rutkowski, M. Jaworski, and P. Duda, A method for automatic adjustment of ensemble size in stream data mining, in Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE, 2016, pp. 9–15.10.1109/IJCNN.2016.7727174
  23. [23] L. Pietruczuk, L. Rutkowski, M. Jaworski, and P. Duda, How to adjust an ensemble size in stream data mining? Information Sciences, vol. 381, pp. 46–54, 2017.10.1016/j.ins.2016.10.028
  24. [24] G. Ditzler, M. Roveri, C. Alippi, and R. Polikar, Learning in nonstationary environments: A survey, IEEE Computational Intelligence Magazine, vol. 10, no. 4, pp. 12–25, 2015.10.1109/MCI.2015.2471196
  25. [25] P. Duda, L. Rutkowski, M. Jaworski, and D. Rutkowska, On the Parzen kernel-based probability density function learning procedures over time-varying streaming data with applications to pattern classification, IEEE transactions on cybernetics, vol 50, no. 4, pp. 1683-1696, 2020.
  26. [26] E. Rafajlowicz, W. Rafajlowicz, Testing (non-) linearity of distributed-parameter systems from a video sequence, Asian Journal of Control, Vol. 12, no. 2, pp. 146–158, 2010.10.1002/asjc.172
  27. [27] E. Rafajlowicz, H. Pawlak-Kruczek, W. Rafajlowicz, Statistical Classifier with Ordered Decisions as an Image Based Controller with Application to Gas Burners, Springer, Lecture Notes in Artificial Intelligence, vol. 8467, pp. 586–597, 2014.
  28. [28] E. Rafajlowicz, W. Rafajlowicz, Iterative learning in optimal control of linear dynamic processes, International Journal Of Control, vol. 91, no. 7, pp. 1522–1540, 2018.
  29. [29] P. Jurewicz, W. Rafajlowicz, J. Reiner, et al., Simulations for Tuning a Laser Power Control System of the Cladding Process, Lecture Notes in Computer Science, vol. 9842, pp. 218–229, Springer, 2016.
  30. [30] E. Rafajlowicz, W. Rafajlowicz, Iterative Learning in Repetitive Optimal Control of Linear Dynamic Processes, 15th International Conference on Artificial Intelligence and Soft Computing (ICAISC), 2016, Springer, vol. 9692, pp. 705–717, 2016.
  31. [31] E. Rafajlowicz, W. Rafajlowicz, Control of linear extended nD systems with minimized sensitivity to parameter uncertainties, Multidimensional Systems And Signal Processing, vol. 24, no. 4, pp. 637–656, 2013.10.1007/s11045-013-0236-5
  32. [32] S. A. Ludwig, Applying a neural network ensemble to intrusion detection, Journal of Artificial Intelligence and Soft Computing Research, vol. 9, no. 3, pp. 177–188, 2019.10.2478/jaiscr-2019-0002
  33. [33] H. Wang, W. Fan, P. S. Yu, and J. Han, Mining concept-drifting data streams using ensemble classifiers, in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. AcM, 2003, pp. 226–235.10.1145/956750.956778
  34. [34] R. Polikar, L. Upda, S. S. Upda, and V. Honavar, Learn++: An incremental learning algorithm for supervised neural networks, IEEE transactions on systems, man, and cybernetics, part C (applications and reviews), vol. 31, no. 4, pp. 497–508, 2001.10.1109/5326.983933
  35. [35] R. Elwell and R. Polikar, Incremental learning of concept drift in nonstationary environments, IEEE Transactions on Neural Networks, vol. 22, no. 10, pp. 1517–1531, 2011.
  36. [36] A. Beygelzimer, S. Kale, and H. Luo, Optimal and adaptive algorithms for online boosting, in Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2015, pp. 2323–2331.
  37. [37] H. M. Gomes, J. P. Barddal, F. Enembreck, and A. Bifet, A survey on ensemble learning for data stream classification, ACM Computing Surveys (CSUR), vol. 50, no. 2, p. 23, 2017.10.1145/3054925
  38. [38] B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, and M. Wozniak, Ensemble learning for data stream analysis: A survey, Information Fusion, vol. 37, pp. 132–156, 2017.10.1016/j.inffus.2017.02.004
  39. [39] L. Breiman, Random forests, Machine learning, vol. 45, no. 1, pp. 5–32, 2001.10.1023/A:1010933404324
  40. [40] H. Abdulsalam, D. B. Skillicorn, and P. Martin, Classifying evolving data streams using dynamic streaming random forests, in International Conference on Database and Expert Systems Applications. Springer, 2008, pp. 643–651.10.1007/978-3-540-85654-2_54
  41. [41] H. Abdulsalam, P. Martin, and D. Skillicorn, Streaming random forests, 2008.10.1109/IDEAS.2007.4318108
  42. [42] H. M. Gomes, A. Bifet, J. Read, J. P. Barddal, F. Enembreck, B. Pfharinger, G. Holmes, and T. Abdessalem, Adaptive random forests for evolving data stream classification, Machine Learning, vol. 106, no. 9-10, pp. 1469–1495, 2017.
  43. [43] P. Domingos and G. Hulten, Mining high-speed data streams, in Proc. 6th ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining, 2000, pp. 71–80.10.1145/347090.347107
  44. [44] A. Bifet and R. Gavaldà, Adaptive learning from evolving data streams, in International Symposium on Intelligent Data Analysis. Springer, 2009, pp. 249–260.10.1007/978-3-642-03915-7_22
  45. [45] E. S. Page, Continuous inspection schemes, Biometrika, vol. 41, no. 1/2, pp. 100–115, 1954.10.1093/biomet/41.1-2.100
  46. [46] J. P. Barddal, H. M. Gomes, F. Enembreck, and B. Pfahringer, A survey on feature drift adaptation: Definition, benchmark, challenges and future directions, Journal of Systems and Software, 07 2016.10.1016/j.jss.2016.07.005
  47. [47] H.-L. Nguyen, Y.-K. Woon, W.-K. Ng, and L. Wan, Heterogeneous ensemble for feature drifts in data streams, in Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 2012, pp. 1–12.10.1007/978-3-642-30220-6_1
  48. [48] A. P. Cassidy and F. A. Deviney, Calculating feature importance in data streams with concept drift using online random forest, in 2014 IEEE International Conference on Big Data (Big Data). IEEE, 2014, pp. 23–28.10.1109/BigData.2014.7004352
  49. [49] R. Zhu, D. Zeng, and M. R. Kosorok, Reinforcement learning trees, Journal of the American Statistical Association, vol. 110, no. 512, pp. 1770–1784, 2015.
  50. [50] L. Yuan, B. Pfahringer, and J. P. Barddal, Iterative subset selection for feature drifting data streams, in Proceedings of the 33rd Annual ACM Symposium on Applied Computing. ACM, 2018, pp. 510–517.10.1145/3167132.3167188
  51. [51] L. C. Molina, L. Belanche, and À. Nebot, Feature selection algorithms: A survey and experimental evaluation, in 2002 IEEE International Conference on Data Mining, 2002. Proceedings. IEEE, 2002, pp. 306–313.
  52. [52] G. Ditzler, J. LaBarck, J. Ritchie, G. Rosen, and R. Polikar, Extensions to online feature selection using bagging and boosting, IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 9, pp. 4504–4509, 2018.
  53. [53] J. P. Barddal, H. M. Gomes, F. Enembreck, and B. Pfahringer, A survey on feature drift adaptation: Definition, benchmark, challenges and future directions, Journal of Systems and Software, 07 2016.10.1016/j.jss.2016.07.005
  54. [54] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, Learning with drift detection, in Brazilian symposium on artificial intelligence. Springer, 2004, pp. 286–295.10.1007/978-3-540-28645-5_29
Language: English
Page range: 287 - 298
Submitted on: Nov 5, 2019
|
Accepted on: May 18, 2020
|
Published on: Jun 15, 2020
Published by: SAN University
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2020 Piotr Duda, Krzysztof Przybyszewski, Lipo Wang, published by SAN University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.