Towards Trasparent Data Access with Context Awareness

Michał Wrzeszcz, Jacek Kitowski, Renata Słota

Abstract


Applying the principles of open research data is an important factor accelerating the production, analysis of scientific results and worldwide collaboration. However, still very little data is being shared. The aim of this article is analysis of existing data access solutions in order to identify reasons for such situation. After analysis of existing solutions and data access stakeholders needs, the authors propose own vision of data access model evolution.

Keywords


distributed data access, transparent data access, distributed data storage, context awareness

Full Text:

PDF

References


Abowd G.D., Dey A.K., Brown P.J., Davies N., Smith M., Steggles P.: Towards a Better Understanding of Context and Context-Awareness. In: Proceedings of the 1st International Symposium on Handheld and Ubiquitous Computing, HUC’99, pp. 304–307. Springer-Verlag, London, 1999. http://dl.acm.org/citation. cfm?id=647985.743843.

Ananthakrishnan R., Chard K., Foster I., Tuecke S.: Globus Platform-as-a-Service for Collaborative Science Applications, Concurrency and Computations: Practice and Experience, vol. 27(2), pp. 290–305, 2014.

BeeGFS. http://http://www.beegfs.com/content/.

Baud J.P.B., Caey J., Lemaitre S., Nicholson C., Smith D., Stewart G.: LCG Data Management: From EDG to EGEE. In: UK e-Science All Hands Meeting, Nottingham, UK. 2005. http://ppewww.ph.gla.ac.uk/preprints/2005/06/.

Borgman C.L.: The Conundrum of Sharing Research Data, Journal of the American Society for Information Science and Technology, vol. 63(6), pp. 1059–1078, 2012. http://dx.doi.org/10.1002/asi.22634.

Braam P.J.: The Coda Distributed File System. http://www.coda.cs.cmu.edu/ljpaper/lj.html.

Brewer E.: CAP twelve years later: How the “Rules” have changed, Computer, vol. 45(2), pp. 23–29, 2012. http://dx.doi.org/10.1109/MC.2012.37.

Ceph Filesystem. http://ceph.com/docs/next/cephfs/.

Chen S., Wang Y., Pedram M.: A joint optimization framework for request scheduling and energy storage management in a data center. In: Pu C., Mohindra A. (eds.), 8th IEEE International Conference on Cloud Computing, CLOUD 2015, pp. 163–170, IEEE Computer Society, 2015. http://dblp.unitrier.de/db/conf/IEEEcloud/IEEEcloud2015.html#ChenWP15.

DataCite: DataCite : helping you to find, access, and reuse research data, 2011. http://datacite.org.

DataNet Federation Consortium. http://datafed.org/.

Dhar V.: Data science and prediction, Communications of the ACM, vol. 56(12), pp. 64–73, 2013. http://dx.doi.org/10.1145/2500499.

Dong D., Herbert J.: FSaaS: File System as a Service. In: Computer Software and Applications Conference (COMPSAC), 2014 IEEE 38th Annual.

Dutka Ł., Wrzeszcz M., Lichoń T., Słota R., Zemek K., Trzepla K., Opioła Ł., Słota R., Kitowski J.: Onedata – a step forward towards globalization of data access for computing infrastructures, Procedia Computer Science, vol. 51, pp. 2843–2847, 2015. http://dx.doi.org/10.1016/j.procs.2015.05.445.

EGI-Engage. https://www.egi.eu/about/egi-engage/.

European Commission, Public consultation ‘Science 2.0’: science in transition. http://ec.europa.eu/research/consultations/science-2.0/background. pdf.

Foster I., Kesselman C.: The Grid. Blueprint for a new computing infrastructure, Morgan Kaufmann Publishers, San Francisco, USA, 1999.

FUSE: Filesystem in Userspace. http://fuse.sourceforge.net/.

Gardner R., Campana S., Duckeck G., Elmsheuser J., Hanushevsky A., Hönig F.G., Iven J., Legger F., Vukotic I., Yang W.: Data federation strategies for ATLAS using XRootD, Journal of Physics: Conference Series, vol. 513, p. 042049, 2014. http://dx.doi.org/10.1088/1742-6596/513/4/042049.

Gilbert S., Lynch N.: Brewer’s conjecture and the feasibility of consistent available partition-tolerant web services, ACM SIGACT News, vol. 33(2), pp. 51–59, 2002. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10. 1.1.20.1495&rep=rep1&type=pdf.

GlusterFS community website. http://www.gluster.org/about/.

Grid File Access Library 2.0 official page. https://svnweb.cern.ch/trac/lcgutil/wiki/gfal2.

Han L., Huang H., Xie C.: Performance Analysis of NAND Flash Based Cache for Network Storage System. In: IEEE Eighth International Conference on Networking, Architecture and Storage, NAS 2013, pp. 68–75, IEEE Computer Society, 2013. http://dblp.uni-trier.de/db/conf/nas/nas2013.html#HanHX13.

Hayashi C.: What is Data Science? Fundamental Concepts and a Heuristic Example. In: Data Science, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization, 1998.

Hey T., Tansley S., Tolle K. (eds.): The Fourth Paradigm: Data-Intensive Scientific Discovery, 2009. http://research.microsoft.com/en-us/collaboration/fourthparadigm/4th_paradigm_book_complete_lr.pdf.

Human Brain Project. https://www.humanbrainproject.eu/.

Hünich D., Müller-Pfefferkorn R.: Managing large datasets with iRODS – a performance analyses. In: Proceedings of the International Multiconference on Computer Science and Information Technology, IMCSIT 2010, pp. 647–654. 2010. http://dblp.uni-trier.de/db/conf/imcsit/imcsit2010.html#HunichM10.

Indigo DataCloud. https://www.indigo-datacloud.eu/.

International DOI Foundation: DOI Handbook, 2012. http://dx.doi.org/10. 1000/182.

Kryza B., Dutka L., Slota R., Kitowski J.: Dynamic VO establishment in distributed heterogeneous business environments. In: Allen G., Nabrzyski J., Seidel E., van Albada G.D., Dongarra J.J., Sloot P.M.A. (eds.): ICCS (2), Lecture Notes in Computer Science, vol. 5545, pp. 709–718. Springer, 2009. http://dblp.uni-trier.de/db/conf/iccS/iccS2009-2.html#KryzaDSK09.

Lamanna G., Antonelli L.A., Contreras J.L., Knödlseder J., Kosack K., Neyroud N., Aboudan A., Arrabito L., Barbier C., Bastieri D., Boisson C., Brau-Nogué S., Bregeon J., Bulgarelli A., Carosi A., Costa A., De Cesare G., de los Reyes R., Fioretti V., Gallozzi S., Jacquemier J., Khelifi B., Kocot J., Lombardi S., Lucarelli F., Lyard E., Maier G., Massimino P., Osborne J.P., Perri M., Rico J., Sanchez D.A., Satalecka K., Siejkowski H., Stolarczyk T., Szepieniec T., Testa V., Walter R., Ward J.E., Zoli A.: Cherenkov Telescope Array Data Management. In: Proceedings of the 34th International Cosmic Ray Conference (ICRC2015), The Hague, The Netherlands, 2015, pp. 1–8, 2015.

Lustre. http://www.whamcloud.com/lustre/.

Martin A.: OneDrive vs Google Drive vs Dropbox: The best cloud storage service of 2017? http://www.alphr.com/dropbox/7034/onedrive-vs-google-drivevs-dropbox-the-best-cloud-storage-service-of-2017.

Martini B., Choo R.K.K.: Cloud storage forensics: ownCloud as a case study, Digital Investigation, vol. 10(4), pp. 287–299, 2013. http://dblp.uni-trier. de/db/journals/di/di10.html#MartiniC13.

Memon A.S., Jensen J., Cernivec A., Benedyczak K., Riedel M.: Federated authentication and credential translation in the EUDAT collaborative data infrastructure. In: Proceedings of the 7th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2014, London, United Kingdom, pp. 726–731, IEEE Computer Society, 2014. http://dblp.uni-trier.de/db/conf/ucc/ucc2014.html#MemonJCBR14.

Mills S., Lucas S., Irakliotis L., Rappa M., Carlson T., Perlowitz B.: DEMYSTIFYING BIG DATA: A Practical Guide to Transforming the Business of Government. Technical report. https://bigdatawg.nist.gov/_uploadfiles/M0068_ v1_3903747095.pdf (accessed 13.06.2018).

Molloy J.C.: The Open Knowledge Foundation: open data means better science, PLoS Biology, vol. 9(12), p. e1001195, 2011. http://dx.doi.org/10.1371/journal.pbio.1001195.

Nikolow D., Slota R., Polak S., Mitera D., Pogoda M., Winiarczyk P., Kitowski J.: Model of QoS management in a distributed data sharing and archiving system. In: Alexandrov V.N., Lees M., Krzhizhanovskaya V.V., Dongarra J., Sloot P.M.A. (eds.), ICCS, Procedia Computer Science, vol. 18, pp. 100–109, Elsevier, 2013. http://dblp.uni-trier.de/db/conf/iccS/iccS2013.html# NikolowSPMPWK13.

OpenStack Object Storage (“Swift”). https://wiki.openstack.org/wiki/Swift.

Pacheco L., Halalai R., Schiavoni V., Pedone F., Rivière E., Felber P.: GlobalFS: A strongly consistent multi-site file system. In: 2016 IEEE 35th Symposium on Reliable Distributed Systems (SRDS), pp. 147–156, 2016.

Pampel H., Dallmeier-Tiessen S.: Open research data: from vision to practice, pp. 213–224. Springer International Publishing, Cham, 2014. http://dx.doi. org/10.1007/978-3-319-00026-8_14.

PanFS Storage Operating System. http://www.panasas.com/products/panfs.

Piedad F., Hawkins M.: High availability: design, techniques, and processes, Prentice Hall, 2001.

Polish National Data Storage. https://www.elettra.eu/Conferences/2014/BDOD/uploads/Main/Polish%20National%20Data%20Storage.pdf.

Raicu I.: Many-task computing: bridging the gap between high throughput computing and high performance computing, VDM Verlag, 2009.

Rettberg N., Principe P.: Paving the way to Open Access scientific scholarly information: OpenAIRE and OpenAIREplus. In: Baptista A.A., Linde P., Lavesson N., de Brito M.A. (eds.), International Conference on Electronic Publishing, ELPUB, IOS Press, 2012. http://dblp.uni-trier.de/db/conf/elpub/elpub2012.html#RettbergP12.

Röblitz T.: Towards implementing virtual data infrastructures – a case study with iRODS, Computer Science, vol. 13(4), pp. 21–34, 2012. http://dblp.unitrier.de/db/journals/aghcs/aghcs13.html#Roblitz12.

Scality. http://www.scality.com/products/what-is-ring/.

Shafer J., Rixner S., Cox A.: The Hadoop distributed filesystem: Balancing portability and performance. In: 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS), pp. 122–133, 2010. http: //dx.doi.org/10.1109/ISPASS.2010.5452045.

Słota R.: Storage QoS provisioning for execution programming of data-intensive applications, Scientific Programming, vol. 20(1), pp. 69–80, 2012. http://dblp. uni-trier.de/db/journals/sp/sp20.html#Slota12.

Słota R., Król D., Skałkowski K., Orzechowski M., Nikolow D., Kryza B., Wrzeszcz M., Kitowski J.: A toolkit for Storage QoS provisioning for data-intensive applications, Computer Science, vol. 13(1), pp. 63–73, 2012. http: //dblp.uni-trier.de/db/journals/aghcs/aghcs13.html#Slota0SONKWK12.

Słota R., Nikolow D., Kitowski J., Król D., Kryza B.: FiVO/QStorMan semantic toolkit for supporting data-intensive applications in distributed environments, Computing and Informatics, vol. 31(5), pp. 1003–1024, 2012. http://dblp.unitrier.de/db/journals/cai/cai31.html#SlotaNK0K12.

Słota R., Nikolow D., Skałkowski K., Kitowski J.: Management of data access with quality of service in PL-Grid environment, Computing and Informatics, vol. 31(2), pp. 463–479, 2012. http://dblp.uni-trier.de/db/journals/cai/cai31.html#SlotaNSK12.

Storj. http://storj.io/.

Syndicate drive. http://syndicatedrive.com/.

Tachyon Project. http://tachyon-project.org/.

Thain D., Livny M.: Parrot: An application environment for data-intensive computing, Scalable Computing: Practice and Experience, vol. 6(3), pp. 9–18, 2005.

Tudorica B.G., Bucur C.: A comparison between several NoSQL databases with comments and notes. In: Roedunet International Conference (RoEduNet), 2011.

Van de Sompel H., Nelson M., Lagoze C., Warner S.: Resource harvesting within the OAI-PMH framework, D-Lib Magazine, vol. 10(12), 2004. http://www.dlib. org/dlib/december04/vandesompel/12vandesompel.html.

Web Object Scaler. http://www.ddn.com/products/object-storage-webobject-scaler-wos/#aboutwos.

Weil S.A., Leung A.W., Brandt S.A., Maltzahn C.: RADOS: A scalable, reliable storage service for petabyte-scale storage clusters. https://ceph.com/wpcontent/uploads/2016/08/weil-rados-pdsw07.pdf.

What is Onedata? https://onedata.org/docs/doc/getting_started/what_ is_onedata.html.

Worldwide LHC Computing Grid. http://wlcg.web.cern.ch/.

Wrzeszcz M., Trzepla K., Słota R., Zemek K., Lichoń T., Opiola Ł., Nikolow D., Dutka Ł., Słota R., Kitowski J.: Metadata organization and management for globalization of data access with onedata. In: Parallel Processing and Applied Mathematics – 11th International Conference, PPAM 2015, Krakow, Poland, September 6–9, 2015. Revised Selected Papers, Part I, pp. 312–321, 2015. http: //dx.doi.org/10.1007/978-3-319-32149-3_30.

Wrzeszcz M., Nikolow D., Lichoń T., Słota R., Dutka Ł., Słota R.G., Kitowski J.: Consistency models for global scalable data access services. In: Wyrzykowski R., Dongarra J., Deelman E., Karczewski K.: 12-th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017, Lublin, Poland, September 2017, Lecture Notes in Computer Science.




DOI: http://dx.doi.org/10.7494/csci.2018.19.2.2844

Refbacks

  • There are currently no refbacks.