WEB PAGES CONTENT ANALYSIS USING BROWSER-BASED VOLUNTEER COMPUTING
DOI:
https://doi.org/10.7494/csci.2013.14.2.215Abstract
Existing solutions to the problem of finding valuable information on the Websuffers from several limitations like simplified query languages, out-of-date in-formation or arbitrary results sorting. In this paper a different approach to thisproblem is described. It is based on the idea of distributed processing of Webpages content. To provide sufficient performance, the idea of browser-basedvolunteer computing is utilized, which requires the implementation of text pro-cessing algorithms in JavaScript. In this paper the architecture of Web pagescontent analysis system is presented, details concerning the implementation ofthe system and the text processing algorithms are described and test resultsare provided.Downloads
References
Kunder M.: WorldWideWebSize.com, 20.09.2012
Alpert J., Hajaj N.: We knew the web was big..., http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html, 25.07.2008
Net Applications.com, Search Engine Market Share, http://marketshare.hitslink.com/search-engine-market-share.aspx, 20.09.2012
Krupa T., Majewski P., Kowalczyk B., Turek W.: On-Demand Web Search Using Browser-Based Volunteer Computing. Proc. of Sixth International Conference on Complex, Intelligent and Software Intensive Systems, pp. 184–190, Palermo, Italy,
Brin S., Page L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Seventh International World-Wide Web Conference, Brisbane, Australia, 1998
Miller R. C., Bharat K.: SPHINX : A Framework for Creating Personal, Site-Specific Web Crawlers. Proc. of WWW7, Brisbane Australia, 1998.
Shoberg J.: Building Search Applications with Lucine and Nutch. ISBN: 978-1590596876, APress 2006.
Sigursson K.: Incremental crawling with Heritrix. Proc. of the 5th International Web Archiving Workshop, 2005.
Sarmenta L. F. G., Hirano S.: Bayanihan: Building and Studying Volunteer Computing Systems Using Java. Future Generation Computer Systems Special Issue on Metacomputing, vol. 15, no. 5/6. Elsevier Publ., 1999.
Anderson D. P.: BOINC: A rb1 System for Public-Resource Computing and Storage. 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh,
USA, 2004.
Korpela E., Werthimer D., Anderson D., Cobb J., Leboisky M.: SETI@home-massively distributed computing for SETI. Computing in Science & Engineering, 3(1): 78–83, 2001.
Cappello F., Djilali S., Fedak G., Herault T., Magniette F., N´eri V., Lodygensky O.: Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid. Future Genera-
tion Computer Systems, 21 (3): 417–437, 2005.
Buyya R., Ma T., Safavi-Naini R., Steketee C., Susilo R.: Building computational grids with apple’s Xgrid middleware. Proc. of Australasian workshops on Grid computing and e-research, pp. 47–54, 2006.
Venkat J.: Grid computing in the enterprise with the UD MetaProcessor. Peer-to-Peer Computing. Proc. Second International Conference on. 2002.
Gears: Gears project, http://webcomputing.iit.bme.hu/, 4.12.2011
Simonarson S.: Browser Based Distributed Computing, TJHSST Senior Research Project Computer Systems Lab. 2010.