WEB PAGES CONTENT ANALYSIS USING BROWSER-BASED VOLUNTEER COMPUTING

Authors

  • Wojciech Turek
  • Edward Nawarecki
  • Grzegorz Dobrowolski
  • Tomasz Krupa
  • Przemysław Majewski

DOI:

https://doi.org/10.7494/csci.2013.14.2.215

Abstract

Existing solutions to the problem of finding valuable information on the Websuffers from several limitations like simplified query languages, out-of-date in-formation or arbitrary results sorting. In this paper a different approach to thisproblem is described. It is based on the idea of distributed processing of Webpages content. To provide sufficient performance, the idea of browser-basedvolunteer computing is utilized, which requires the implementation of text pro-cessing algorithms in JavaScript. In this paper the architecture of Web pagescontent analysis system is presented, details concerning the implementation ofthe system and the text processing algorithms are described and test resultsare provided.

Downloads

Download data is not yet available.

References

Kunder M.: WorldWideWebSize.com, 20.09.2012

Alpert J., Hajaj N.: We knew the web was big..., http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html, 25.07.2008

Net Applications.com, Search Engine Market Share, http://marketshare.hitslink.com/search-engine-market-share.aspx, 20.09.2012

Krupa T., Majewski P., Kowalczyk B., Turek W.: On-Demand Web Search Using Browser-Based Volunteer Computing. Proc. of Sixth International Conference on Complex, Intelligent and Software Intensive Systems, pp. 184–190, Palermo, Italy,

Brin S., Page L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Seventh International World-Wide Web Conference, Brisbane, Australia, 1998

Miller R. C., Bharat K.: SPHINX : A Framework for Creating Personal, Site-Specific Web Crawlers. Proc. of WWW7, Brisbane Australia, 1998.

Shoberg J.: Building Search Applications with Lucine and Nutch. ISBN: 978-1590596876, APress 2006.

Sigursson K.: Incremental crawling with Heritrix. Proc. of the 5th International Web Archiving Workshop, 2005.

Sarmenta L. F. G., Hirano S.: Bayanihan: Building and Studying Volunteer Computing Systems Using Java. Future Generation Computer Systems Special Issue on Metacomputing, vol. 15, no. 5/6. Elsevier Publ., 1999.

Anderson D. P.: BOINC: A rb1 System for Public-Resource Computing and Storage. 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh,

USA, 2004.

Korpela E., Werthimer D., Anderson D., Cobb J., Leboisky M.: SETI@home-massively distributed computing for SETI. Computing in Science & Engineering, 3(1): 78–83, 2001.

Cappello F., Djilali S., Fedak G., Herault T., Magniette F., N´eri V., Lodygensky O.: Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid. Future Genera-

tion Computer Systems, 21 (3): 417–437, 2005.

Buyya R., Ma T., Safavi-Naini R., Steketee C., Susilo R.: Building computational grids with apple’s Xgrid middleware. Proc. of Australasian workshops on Grid computing and e-research, pp. 47–54, 2006.

Venkat J.: Grid computing in the enterprise with the UD MetaProcessor. Peer-to-Peer Computing. Proc. Second International Conference on. 2002.

Gears: Gears project, http://webcomputing.iit.bme.hu/, 4.12.2011

Simonarson S.: Browser Based Distributed Computing, TJHSST Senior Research Project Computer Systems Lab. 2010.

Downloads

Published

2013-06-17

Issue

Section

Articles

How to Cite

WEB PAGES CONTENT ANALYSIS USING BROWSER-BASED VOLUNTEER COMPUTING. (2013). Computer Science, 14(2), 215. https://doi.org/10.7494/csci.2013.14.2.215

Most read articles by the same author(s)

1 2 > >>