WEB PAGES CONTENT ANALYSIS USING BROWSER-BASED VOLUNTEER COMPUTING

Wojciech Turek; Edward Nawarecki; Grzegorz Dobrowolski; Tomasz Krupa; Przemysław Majewski

doi:10.7494/csci.2013.14.2.215

Authors

Wojciech Turek
Edward Nawarecki
Grzegorz Dobrowolski
Tomasz Krupa
Przemysław Majewski

DOI:

https://doi.org/10.7494/csci.2013.14.2.215

Abstract

Existing solutions to the problem of ﬁnding valuable information on the Websuﬀers from several limitations like simpliﬁed query languages, out-of-date in-formation or arbitrary results sorting. In this paper a diﬀerent approach to thisproblem is described. It is based on the idea of distributed processing of Webpages content. To provide suﬃcient performance, the idea of browser-basedvolunteer computing is utilized, which requires the implementation of text pro-cessing algorithms in JavaScript. In this paper the architecture of Web pagescontent analysis system is presented, details concerning the implementation ofthe system and the text processing algorithms are described and test resultsare provided.

Downloads

Download data is not yet available.

References

Kunder M.: WorldWideWebSize.com, 20.09.2012

Alpert J., Hajaj N.: We knew the web was big..., http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html, 25.07.2008

Net Applications.com, Search Engine Market Share, http://marketshare.hitslink.com/search-engine-market-share.aspx, 20.09.2012

Krupa T., Majewski P., Kowalczyk B., Turek W.: On-Demand Web Search Using Browser-Based Volunteer Computing. Proc. of Sixth International Conference on Complex, Intelligent and Software Intensive Systems, pp. 184–190, Palermo, Italy,

Brin S., Page L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Seventh International World-Wide Web Conference, Brisbane, Australia, 1998

Miller R. C., Bharat K.: SPHINX : A Framework for Creating Personal, Site-Speciﬁc Web Crawlers. Proc. of WWW7, Brisbane Australia, 1998.

Shoberg J.: Building Search Applications with Lucine and Nutch. ISBN: 978-1590596876, APress 2006.

Sigursson K.: Incremental crawling with Heritrix. Proc. of the 5th International Web Archiving Workshop, 2005.

Sarmenta L. F. G., Hirano S.: Bayanihan: Building and Studying Volunteer Computing Systems Using Java. Future Generation Computer Systems Special Issue on Metacomputing, vol. 15, no. 5/6. Elsevier Publ., 1999.

Anderson D. P.: BOINC: A rb1 System for Public-Resource Computing and Storage. 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh,

USA, 2004.

Korpela E., Werthimer D., Anderson D., Cobb J., Leboisky M.: SETI@home-massively distributed computing for SETI. Computing in Science & Engineering, 3(1): 78–83, 2001.

Cappello F., Djilali S., Fedak G., Herault T., Magniette F., N´eri V., Lodygensky O.: Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid. Future Genera-

tion Computer Systems, 21 (3): 417–437, 2005.

Buyya R., Ma T., Safavi-Naini R., Steketee C., Susilo R.: Building computational grids with apple’s Xgrid middleware. Proc. of Australasian workshops on Grid computing and e-research, pp. 47–54, 2006.

Venkat J.: Grid computing in the enterprise with the UD MetaProcessor. Peer-to-Peer Computing. Proc. Second International Conference on. 2002.

Gears: Gears project, http://webcomputing.iit.bme.hu/, 4.12.2011

Simonarson S.: Browser Based Distributed Computing, TJHSST Senior Research Project Computer Systems Lab. 2010.

WEB PAGES CONTENT ANALYSIS USING BROWSER-BASED VOLUNTEER COMPUTING

Authors

DOI:

Abstract

Downloads

References

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)

Latest publications

Information

Make a Submission