Abstract of PhD Thesis Intelligent Data Processing and Its Applications Aniko Szilvia Vanger 1 Introduction Nowadays the rapidly increasing performance of hardware and the efficient intelligent scientific algorithms enable us to store and process big data. This tendency will cover more opportunities to get more and more information from the large amount of data. My thesis is only a precursor of this topic, because I did not have sufficient hardware and I had only a little data to be processed. However, all the topics of my thesis belong to the intelligent data processing. In Chapter 2 of my thesis I introduce a new clustering algorithm named GridOPTICS, whose goal is to accelerate the well-known OPTICS density clustering technique. The density-based clustering techniques are capable of recognizing arbitrary-shaped clusters in a point set. The DBSCAN results in only one cluster set, but the OPTICS generates a reachability plot from which a lot of cluster sets can be read as a result without having to execute the whole algorithm again. I experienced that it is very slow for large data sets, so I wanted to nd a solution to accelerate it. I wanted to see that the speed of the GridOptics is better than OPTICS, so I executed both the algorithms on several point sets. In Chapter 3 of my thesis I introduce two new modules of the Cardiospy system of Labtech Ltd. On these two projects I worked together with Istvan Juhasz, Laszlo Farkas, Peter Toth, and 4 students of the university, Jozsef Kuk, Adam Balazs,Bela Vamosi, and David Angyal.Bela Kincs, who was the executive of the Labtech Ltd., wanted the Cardiospy system to be improved. He and his team surveyed what the demand of the users are in this area and how their software could be better. The Labtech Ltd. And the University of Debrecen worked together in two projects. In both cases theLabtech had early solutions for the algorithms, but they were insufficient and slow, the results could not be validated, or they gave insufficient results. Moreover, there were no visualization tools for either problems. The tasks of the team of the University of Debrecen were to give a quick algorithm and to create an interactive visualization interface for each problem. The goal of the first module of Cardiospy is to cluster and visualize the long (up to 24-hours) recordings of ECG signals, because the manual evaluation of long recordings is a lengthy and tedious task. During this project I recognized that it is a very interesting topic to find out how the OPTICS can be accelerated with a grid clustering method independently, without any ECG signals. The goal of the second module of Cardiospy is to calculate and visualize the steps of the blood pressure measurement and the values of blood pressure. The recordings (which can contain a sequence of measurements) are collected by a microcontroller, but this module runs on a PC. With the help of the application the physicians can recognize the types of errors on the measurements and they can also find the noisy measurements. In Chapter 4 I introduce how I applied an active learning method in a subject whose topic is database programming. I taught Oracle SQL and PL/SQL in the Advanced DBMS 1 subject, and I saw that the students do not practice at home. The prerequirements of this subject are the Programming language and the Database systems courses, so they are not absolute beginners in the field. I wanted to force the students to try out the programming tools independently, but with the help of the teacher. To support the active learning method, an application had to be built. The application helps the teacher organize and monitor the tasks and their solutions of the students. Moreover the application can verify the syntax of the solutions before the students upload them. If the syntax is wrong, the student cannot upload it. This feature makes the task of the teacher easier. To demonstrate whether the active learning method is good or not, I gathered and examined the results of the students during the 3 years when I used this method. New results The abstract of the thesis presents new results grouped into four main statements. The first statement deals with a clustering method, the second one demonstrates an application of this clustering method, namely clustering of ECG signals, which can be considered as an application of the GridOPTICS clustering method. The third statement introduces the visualization of the steps of the blood pressure measurement, whereas the last statement demonstrates how the solutions of the students can easily be managed during an active learning method for database programming. 2.1 A clustering algorithm Cluster analysis is an important research field of data mining, which is applied on many other disciplines, such as pattern recognition, image processing, machine learning, bioinformatics, information retrieval, artificial intelligence, marketing, psychology, etc. The density-based clustering approach is capable of finding arbitrarily shaped clusters, but they have a disadvantage, namely it is hard to choose parameter values in order that the algorithm gives an appropriate result (Gan et al., 2007). The OPTICS (Ankerst et al., 1999) clustering algorithm gives not only one result but a set of the results. It builds a reachability plot, namely it orders the input points, and it assigns a reachability distance to an input point. Based on the reachability plot, the algorithm can produce a lot of clustering results. Building the reachability plot is slow, but reading the clusters from the reachability plot is fast.