Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata
RISE - Research Institutes of Sweden, ICT, SICS.
RISE - Research Institutes of Sweden, ICT, SICS.
RISE - Research Institutes of Sweden, ICT, SICS.
Show others and affiliations
2017 (English)In: Proceedings - International Conference on Distributed Computing Systems, 2017, 2525-2528 p.Conference paper, Published paper (Refereed)
Abstract [en]

Hadoop is a popular system for storing, managing,and processing large volumes of data, but it has bare-bonesinternal support for metadata, as metadata is a bottleneck andless means more scalability. The result is a scalable platform withrudimentary access control that is neither user-nor developer-friendly. Also, metadata services that are built on Hadoop, suchas SQL-on-Hadoop, access control, data provenance, and datagovernance are necessarily implemented as eventually consistentservices, resulting in increased development effort and morebrittle software. In this paper, we present a new project-based multi-tenancymodel for Hadoop, built on a new distribution of Hadoopthat provides a distributed database backend for the HadoopDistributed Filesystem's (HDFS) metadata layer. We extendHadoop's metadata model to introduce projects, datasets, andproject-users as new core concepts that enable a user-friendly, UI-driven Hadoop experience. As our metadata service is backed bya transactional database, developers can easily extend metadataby adding new tables and ensure the strong consistency ofextended metadata using both transactions and foreign keys.

Place, publisher, year, edition, pages
2017. 2525-2528 p.
Keyword [en]
Data Management, Dynamic Roles, Hadoop, Mutli-tenancy, Access control, Data flow analysis, Information management, Metadata, Data provenance, Distributed database, Metadata services, Strong consistency, Transactional database, Distributed computer systems
National Category
Natural Sciences
Identifiers
URN: urn:nbn:se:ri:diva-30835DOI: 10.1109/ICDCS.2017.41Scopus ID: 2-s2.0-85027275789ISBN: 9781538617915 OAI: oai:DiVA.org:ri-30835DiVA: diva2:1139370
Conference
37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, 5 June 2017 through 8 June 2017
Available from: 2017-09-07 Created: 2017-09-07 Last updated: 2017-09-07Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus
By organisation
SICS
Natural Sciences

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
v. 2.27.0