Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata
KTH Royal Institute of Technology, Sweden.
RISE - Research Institutes of Sweden, ICT, SICS.
RISE - Research Institutes of Sweden, ICT, SICS.ORCID iD: 0000-0002-9216-7785
RISE - Research Institutes of Sweden, ICT, SICS.ORCID iD: 0000-0003-0571-1197
Show others and affiliations
2017 (English)In: Proceedings - International Conference on Distributed Computing Systems, 2017, p. 2525-2528Conference paper, Published paper (Refereed)
Abstract [en]

Hadoop is a popular system for storing, managing,and processing large volumes of data, but it has bare-bonesinternal support for metadata, as metadata is a bottleneck andless means more scalability. The result is a scalable platform withrudimentary access control that is neither user-nor developer-friendly. Also, metadata services that are built on Hadoop, suchas SQL-on-Hadoop, access control, data provenance, and datagovernance are necessarily implemented as eventually consistentservices, resulting in increased development effort and morebrittle software. In this paper, we present a new project-based multi-tenancymodel for Hadoop, built on a new distribution of Hadoopthat provides a distributed database backend for the HadoopDistributed Filesystem's (HDFS) metadata layer. We extendHadoop's metadata model to introduce projects, datasets, andproject-users as new core concepts that enable a user-friendly, UI-driven Hadoop experience. As our metadata service is backed bya transactional database, developers can easily extend metadataby adding new tables and ensure the strong consistency ofextended metadata using both transactions and foreign keys.

Place, publisher, year, edition, pages
2017. p. 2525-2528
Keywords [en]
Data Management, Dynamic Roles, Hadoop, Mutli-tenancy, Access control, Data flow analysis, Information management, Metadata, Data provenance, Distributed database, Metadata services, Strong consistency, Transactional database, Distributed computer systems
National Category
Natural Sciences
Identifiers
URN: urn:nbn:se:ri:diva-30835DOI: 10.1109/ICDCS.2017.41Scopus ID: 2-s2.0-85027275789ISBN: 9781538617915 (print)OAI: oai:DiVA.org:ri-30835DiVA, id: diva2:1139370
Conference
37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, 5 June 2017 through 8 June 2017
Available from: 2017-09-07 Created: 2017-09-07 Last updated: 2023-05-22Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Kakantousis, TheofilosBerthou, GautierDowling, Jim

Search in DiVA

By author/editor
Kakantousis, TheofilosBerthou, GautierDowling, Jim
By organisation
SICS
Natural Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 36 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf