To search, Click below search items.


All Published Papers Search Service


Parallel Query Processing in a Cluster using MPI and File System Caching


N. Ch. S. N. Iyengar , Monis Huda, Pranav Juneja, Saurabh Jain, V Vijayasherly


Vol. 9  No. 5  pp. 249-254


Data intensive applications that rely heavily on huge databases waste a lot of time in searching and retrieval especially if there is a single server retrieving data from the database. This paper proposes a Beowulf cluster for fast query processing by distributing the database horizontally over nodes through a load balancing act. A mathematical model is proposed to optimally partition data among the nodes. Communication between nodes is to be achieved through MPI(Message Passing Interface). A file system cache has been created to further decrease the query processing time. Caching is performed with the help of Apache Lucene API. Results would be retrieved depending upon a cache hit or miss. The size of the cache would be monitored and if it exceeds a threshold value deletion operation would be performed by applying the LRU(least recently used) algorithm. Through experimental results we have found that caching reduces the query processing time substantially. We can further improve the result by performing query optimization by indexing the attributes in complex queries. This approach has reduced the query processing time manifold as compared to a single overloaded server. With networks growing in speed and highly available secondary storage it is expected to perform even better in future.


Fast Query Processing, MPI, Load Balancing, File System Cache