•  
  •  
 

Chemical Technology, Control and Management

Abstract

An analytical method for determining informative sets of features (INP) is developed, taking into account the resource for criteria based on the use of a measure of dispersion of classified objects. The areas of existence of the solution are defined. The statements and properties for the Fischer-type information criterion are proved, using which the proposed analytical method for determining the INP guarantees optimal results in the sense of maximizing the selected functional. The appropriateness of choosing this type of informative criterion is justified. A method for transforming attributes is proposed. The universality of the method in relation to the type of features is shown. An algorithm for implementing this method is given. In addition, the paper discusses the dynamics of the growth of information volume in the world, problems related to big data, as well as problems and tasks of pre-processing data. The relevance of reducing the dimension of the feature space for performing data processing and visualization without unnecessary difficulties is proved. The disadvantages of existing methods and algorithms for selecting an informative set of features are shown.

First Page

57

Last Page

64

DOI

https://doi.org/10.34920/2020.4.57-64

References

  1. https://regnum.ru/news/it/2574265.html(data obrasheniya: 15.01.2020)
  2. Jiawei Han, Micheline Kamber, Jian Pei, “Data mining: concepts and techniques” 3rd ed. by Elsevier Inc., USA, 2012.
  3. A.V.Zamyatin, “Intellektualniy analiz dannix” [Data mining], Tomsk: Izd.dom TomGU, 2016. (in Russian).
  4. Jian Long Zhou, Fang Chen. Human and Machine Learning, “Visible, Explainable, Trustworthy and Transparent” Springer, Human-Computer interaction Series, 2018, Switzerland, p.482.
  5. N.G.Zagoruyko, “Prikladnie metodi analiza dannix i znaniy” [Applied methods of data and knowledge analysis], Novosibirsk: IM SO RAN, 1999, 270 p. (in Russian).
  6. M.Nasma, “Sovremennie tendensii metodov intellektualnogo analiza dannix: metod klasterizatsii” [Current trends in data mining methods: classification method], Moskovskiy ekonomicheskiy jurnal, no. 6, 2019. (in Russian).
  7. A.X.Nishanov, B.B.Akbaraliev, O.B.Ruzibaev, O.K.Xujaev, “Sravnitelniy analiz algoritmov na osnove nechetkogo K-srednix s primeneniem razlichnix metrik” [Comparative analysis of algorithms based on the fuzzy K-mean using various metrics], Kimyoviy texnologiya, nazorat va boshqaruv, Xalqaro ilmiy-texnikaviy jurnal, 2014, no. 6, pp. 78-82. (in Russian).
  8. M.M.Kamilov, A.H.Nishanov, B.B.Akbaraliev, “About one clustering algorithm in intellectual data analysis” Proceedings of ICEIC2008, Tashkent, 2008, June 24-27, pp. 476-478.
  9. M.M.Kamilov, A.H.Nishanov, B.B.Akbaraliev, “Methods of forming of optimal sign space for object recognition in the class of logic-heuristic algorithms” Fourth World Conference on Intelligent Systems for Industrial Automation - WCIS 2006, Tashkent.
  10. B.B.Akbaraliev, “Formirovanie informativnix naborov priznakov v slojnix sistemax raspoznavaniya” [The formation of informative sets of features in the layer recognition systems], TATU xabarlari, no. 2, pp. 47-50, 2007. (in Russian).
  11. A.T.Raxmanov, B.B.Akbaraliev, A.K.Ergashev, “Ob odnom metode sokrashenie razmernosti ob’ema viborki v intellektualnom analize dannix” [Method for dimensionality reduction in sample size in data mining], “Informatika va Energetika muammolari” O’zbekiston jurnali, no. 1, pp. 76-79, 2011. (in Russian).
  12. Sunita Beniwal, Jitender Arora, “Classification and Feature Selection Techniques in Data Mining”, International Journal of Engineering Research & Technology (IJERT), vol. 1, Issue 6, 2012.
  13. Huiqing Liu, Jinyan Li, Limsoon Wong, “A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns”, Genome Informatics, no. 13, pp. 51-60, 2002.
  14. M.N.Krasnyanskiy, i dr., “Sravnitelniy analiz metodov mashinnogo obucheniya dlya resheniya zadachi klassifikatsii dokumentov nauchno-obrazovatelnogo uchrejdeniya” [Comparative analysis of machine learning methods for solving problems of classification of documents of scientific and educational institutions], Vestnik VGU, seriya: Sistemniy analiz i informatsionnie texnologii, no. 3, pp. 173-182, 2018. (in Russain).
  15. G.S.Lbov, “Metodi obrabotki raznotipnix eksperimentalnix dannix” [Methods for processing different types of experimental data], Novosibirsk: Nauka, Sib.otd., 1981, 160 p. (in Russian).
  16. Yu.I.Juravlev, “Izbrannie nauchnie trudi” [Selected scientific works], Moskva: Izdatelstvo Magistr, 1998, 420 p. (in Russian).
  17. L.Zhang, M.Luo, J.Liu, Z.Li, Q.Zheng, “Diverse fuzzy c-means for image clustering”, Pattern Recognition Letters, vol. 130, pp. 275-283, 2020.
  18. D.Santra, S.K.Basu, J.K.Mandal, S.Goswami, “Rough set based lattice structure for knowledge representation in medical expert systems: Low back pain management case study”, Expert Systems with Applications vol. 145, no. 1, 2020.
  19. Y.Xiong, R.Zuo, “Recognizing multivariate geochemical anomalies for mineral exploration by combining deep learning and one-class support vector machine”, Computers and Geosciences, vol. 140, 2020. 104484
  20. J.Gai, J.Shen, H.Wang, Y.Hu, “A Parameter-Optimized DBN Using GOA and Its Application in Fault Diagnosis of Gearbox”, Shock and Vibration, vol. 2020, 2020, 4294095.
  21. P.S.Raja, K.Thangavel, Missing value imputation using unsupervised machine learning techniques”, Soft Computing, no.24(6), pp. 4361-4392, 2020.
  22. D.Wang, F.Tian, S.X.Yang, D.Jiang, B.Cai, “Improved deep CNN with parameter initialization for data analysis of near-infrared spectroscopy sensors”, Sensors, (Switzerland) vol. 20, Issue 3, no. 20(3), p. 874, 2020.
  23. P.Lou, A.Jimeno Yepes, Z.Zhang, C.Li, J.Wren, “BioNorm: Deep learning-based event normalization for the curation of reaction databases”, Bioinformatics, vol. 36, Issue 2, pp. 611-620, 2020.
  24. S.Fu, X.Liu, “A new method to solve the problem of facing less learning samples in signal modulation recognition”, Eurasip Journal on Wireless Communications and Networking, vol. 2020, Issue 1, no. 8, 2020.
  25. D.Wei, T.Chen, S.Li, Y.Zhao, T.Li, “Adaptive dictionary learning based on local configuration pattern for face recognition”, Eurasip Journal on Advances in Signal Processing, vol. 2020, Issue 1, no. 20. 2020.
  26. M.Ala’raj, M.Majdalawieh, M.F.Abbod, “Improving binary classification using filtering based on k-NN proximity graphs”, Journal of Big Data, vol.7, Issue 1, no. 15, 2020.
  27. G.Mishra, V.P.Vishwakarma, A.Aggarwal, “Constrained L1-optimal sparse representation technique for face recognition” Optics and Laser Technology, vol. 129, September 2020, 106232.
  28. T.C.G.Kibbey, R.Jabrzemski, D.M.O'Carroll, “Supervised machine learning for source allocation of per- and polyfluoroalkyl substances (PFAS) in environmental samples” Chemosphere, vol. 252, August 2020, 126593.
  29. Z.Shen, Z.Man, Z.Cao, J.Zheng, “A new intelligent pattern classifier based on structured sparse representation”, Computers and Electrical Engineering, vol. 84, June 2020, 106641.
  30. А.Kh.Nishanov, G.P.Djurayev, М.Kh.Kasanova, “Improved algorithms for calculating evaluations in processing medical data” National Institute of Science Communication and Information Resources, (NISCAIR)-India, 2019, pp. 3158-3165.
  31. M.Kamilov, A.Nishanov, R.Beglerbekov,Modified stages of algorithms for computing estimates in the space of informative featuresInternational Journal of Innovative Technology and Exploring Engineering, no 8(6), 2019.
  32. A.Nishanov, E.Avazov, B.Akbaraliyev, “Partial selection method and algorithm for determining graph-based traffic routes in a real-time environment” International Journal of Innovative Technology and Exploring Engineering, no. 8(6), pp. 696-698, 2019.
  33. E.Emary, H.Zawbaa, A.Hassanien, “Binary ant lion approaches for feature selection” Neurocomputing, vol. 213, 2016. DOI 10.1016/j.neucom.2016.03.101. ISSN 18728286.
  34. Z.Yong, G.Dun-wei, Z.Wan-qiu, “Feature selection of unreliable data using an improved multi-objective PSO algorithm” Neurocomputing, vol. 171, 2016. DOI 10.1016/j.neucom.2015.07.057. ISSN 18728286.
  35. Y.Zhang, D.Gong, X.Sun, Y.A.Guo, “PSO-based multi-objective multi-label feature selection method in classification” Scientific Reports, vol. 7(1), 2017. DOI 10.1038/s41598-017-00416-0. ISSN 20452322.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.