Apache HBase

Related tags

java database hbase
Overview
Apache HBase [1] is an open-source, distributed, versioned, column-oriented
store modeled after Google' Bigtable: A Distributed Storage System for
Structured Data by Chang et al.[2]  Just as Bigtable leverages the distributed
data storage provided by the Google File System, HBase provides Bigtable-like
capabilities on top of Apache Hadoop [3].

To get started using HBase, the full documentation for this release can be
found under the doc/ directory that accompanies this README.  Using a browser,
open the docs/index.html to view the project home page (or browse to [1]).
The hbase 'book' at http://hbase.apache.org/book.html has a 'quick start'
section and is where you should being your exploration of the hbase project.

The latest HBase can be downloaded from an Apache Mirror [4].

The source code can be found at [5]

The HBase issue tracker is at [6]

Apache HBase is made available under the Apache License, version 2.0 [7]

The HBase mailing lists and archives are listed here [8].

The HBase distribution includes cryptographic software. See the export control
notice here [9].

1. http://hbase.apache.org
2. http://research.google.com/archive/bigtable.html
3. http://hadoop.apache.org
4. http://www.apache.org/dyn/closer.lua/hbase/
5. https://hbase.apache.org/source-repository.html
6. https://hbase.apache.org/issue-tracking.html
7. http://hbase.apache.org/license.html
8. http://hbase.apache.org/mail-lists.html
9. https://hbase.apache.org/export_control.html
Issues
  • HBASE-23887 Up to 3x increase BlockCache performance

    HBASE-23887 Up to 3x increase BlockCache performance

    When data much more than BlockCache we can save CPU cycles and increase performance to 3 times. PS Sorry, had some problems with build previuos PR, trying again.

    opened by pustota2009 134
  • HBase-22027: Split non-MR related parts of TokenUtil off into a Clien…

    HBase-22027: Split non-MR related parts of TokenUtil off into a Clien…

    …tTokenUtil, and move ClientTokenUtil to hbase-client

    See https://issues.apache.org/jira/browse/HBASE-22027

    opened by srdo 133
  • HBASE-23767 Add JDK11 compilation and unit test support to Github precommit

    HBASE-23767 Add JDK11 compilation and unit test support to Github precommit

    Rebuild our Dockerfile with support for multiple JDK versions. Try to use it with Github/precommit.

    opened by ndimiduk 107
  • HBASE-22634 : Improve performance of BufferedMutator

    HBASE-22634 : Improve performance of BufferedMutator

    As requested in the Jira HBASE-22634

    opened by sbarnoud 93
  • HBASE-11062 hbtop

    HBASE-11062 hbtop

    I removed the dependency for Lanterna to remove the license problem.

    We can run hbtop by running hbase top command, and press h key in the top screen for the help screen.

    For the details of hbtop, this is the presentation in NoSQL day 2019 (the name is changed from htop to hbtop): https://dataworkssummit.com/nosql-day-2019/session/supporting-apache-hbase-troubleshooting-and-supportability-improvements/

    opened by brfrn169 79
  • HBASE-24382 Flush partial stores of region filtered by seqId when arc…

    HBASE-24382 Flush partial stores of region filtered by seqId when arc…

    …hive wal due to too many wals

    opened by bsglz 77
  • HBASE-25869 WAL value compression

    HBASE-25869 WAL value compression

    WAL storage can be expensive, especially if the cell values represented in the edits are large, consisting of blobs or significant lengths of text. Such WALs might need to be kept around for a fairly long time to satisfy replication constraints on a space limited (or space-contended) filesystem.

    We have a custom dictionary compression scheme for cell metadata that is engaged when WAL compression is enabled in site configuration. This is fine for that application, where we can expect the universe of values and their lengths in the custom dictionaries to be constrained. For arbitrary cell values it is better to use one of the available compression codecs, which are suitable for arbitrary albeit compressible data.

    Microbrenchmark Results

    Site configuration used:

    <!-- retain all WALs  -->
    <property>
      <name>hbase.master.logcleaner.ttl</name>
      <value>604800000</value>
    </property>
    <!-- enable compression -->
    <property>
     <name>hbase.regionserver.wal.enablecompression</name>
     <value>true</value>
    </property>
    <!-- enable value compression -->
    <property>
     <name>hbase.regionserver.wal.value.enablecompression</name>
     <value>true</value>
    </property>
    <!-- set value compression algorithm —>
    <property>
     <name>hbase.regionserver.wal.value.compression.type</name>
     <value>snappy</value>
    </property>
    

    Loader: IntegrationTestLoadCommonCrawl

    Input: s3n://commoncrawl/crawl-data/CC-MAIN-2021-10/segments/1614178347293.1/warc/CC-MAIN-20210224165708-20210224195708-00000.warc.gz

    SNAPPY or ZSTD at level 1 are recommended, all other options provided for comparison.

    Microbenchmarks are collected with this change. Statistics are collected over the lifetime of the regionserver and are dumped at end of test at shutdown. Statistics are updated under synchronization but this is done in a way that excludes that overhead from measurement. The normal patch does not contain either the instrumentation or the synchronization point. Nanoseconds are converted to milliseconds for the table.

    Mode | WALs aggregate size | WALs aggregate size difference | WAL writer append time (ms avg) -- | -- | -- | -- Default | 5,117,369,553 | - | 0.290 (stdev 0.328) Compression enabled, value compression not enabled | 5,002,683,600 | (2.241%) | 0.372 (stddev 0.336) ~~Compression enabled, value compression enabled, v1 patch, Deflate (best speed)~~ | ~~1,209,947,515~~ | ~~(76.4%)~~ | ~~12.694 (stddev 8.48)~~ Compression enabled, value compression enabled, v2 patch, algorithm=SNAPPY | 1,616,387,702 | (68.4%) | 0.027 (stddev 0.204) Compression enabled, value compression enabled, v2 patch, algorithm=ZSTD (best speed) | 1,149,008,133 | (77.55%) | 0.043 (stddev 0.195) Compression enabled, value compression enabled, v2 patch, algorithm=ZSTD (default) | 1,089,241,811 | (78.7%) | 0.056 (stdev 0.310) Compression enabled, value compression enabled, v2 patch, algorithm=ZSTD (best compression) | 941,452,655 | (81.2%) | 0.231 (stddev 1.11) Options below not recommended. | - | - | - Compression enabled, value compression enabled, v2 patch, algorithm=GZ | 1,082,414,015 | (78.9%) | 0.267 (stddev 1.325) Compression enabled, value compression enabled, v2 patch, algorithm=LZMA (level 1) | 1,013,951,637 | (80.2%) | 2.157 (stddev 3.302) Compression enabled, value compression enabled, v2 patch, algorithm=LZMA (default) | 940,884,618 | (81.7%) | 4.739 (stdev 8.609)

    opened by apurtell 77
  • HBASE-25975 Row Commit Sequencer

    HBASE-25975 Row Commit Sequencer

    Use a row commit sequencer in HRegion to ensure that only the operations that mutate disjoint sets of rows are able to commit within the same clock tick. This maintains the invariant that more than one mutation to a given row will never be committed in the same clock tick.

    Callers will first acquire row locks for the row(s) the pending mutation will mutate. Then they will use RowCommitSequencer.getRowSequence to ensure that the set of rows about to be mutated do not overlap with those for any other pending mutations in the current clock tick. If an overlap is identified, getRowSequence will yield and loop until there is no longer an overlap and the caller's pending mutation can succeed.

    TODO:

    • Needs tests.
    • All I've confirmed as of now is TestHRegion and friends pass.
    opened by apurtell 4
  • HBASE-25913 Introduce EnvironmentEdge.Clock and Clock.currentTimeAdvancing

    HBASE-25913 Introduce EnvironmentEdge.Clock and Clock.currentTimeAdvancing

    • Introduce a Clock abstraction into EnvironmentEdge and define Clock#currentTimeAdvancing, which ensures that every call to this method returns an advancing time.

    • Use a per region Clock in HRegion to ensure the time advances.

    The essential changes are in three files, BoundedIncrementYieldAdvancingClock, BaseEnvironmentEdge, and HRegion.

    I explored various options for implementing an advancing time, please refer to the microbenchmark results here and here. They are all included in this patch although only BoundedIncrementYieldAdvancingClock is used.

    TODO:

    • One reasonable HRegion based test that ensures the timestamp substitutions made in a tight loop that would do more than one in a clock tick are all unique.

    • We optimize for single row updates. However for updates where more than one row is involved we go immediately to the region scope. We could imagine taking the above idea further and make this formal as a row clock, and keep track of row clocks similar to row locks (maybe they could be combined), and take some kind of union-of-row-clocks for all rows involved in a batch mutation, but this would seem to be fairly complex, and we need the initial changes to be reasonable reviewable, but could be considered as follow up work.

    opened by apurtell 11
  • HBASE-25950 add basic compaction server metric

    HBASE-25950 add basic compaction server metric

    the ServerLoad has RegionLoad, ReplicationLoadSource/ReplicationLoadSink metrics, which is nothing to do with Compaction server. So, introduce CompactionServerLoad, only has compaction related metircs

    opened by nyl3532016 6
Releases(rel/2.4.2)
Owner
The Apache Software Foundation
The Apache Software Foundation
Mirror of Apache Cassandra

Apache Cassandra Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key. Partitioning

The Apache Software Foundation 6.7k Jun 17, 2021
Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability

Apache CouchDB README Installation For a high-level guide to Unix-like systems, inc. Mac OS X and Ubuntu, see: INSTALL.Unix For a high-level guide to

The Apache Software Foundation 5k Jun 6, 2021
A distributed, fault-tolerant graph database

STATUS Twitter is no longer maintaining this project or responding to issues or PRs. FlockDB FlockDB is a distributed graph database for storing adjan

Twitter Archive 3.3k Jun 3, 2021
Graphs for Everyone

Neo4j: Graphs for Everyone Neo4j is the world’s leading Graph Database. It is a high performance graph store with all the features expected of a matur

Neo4j 9k Jun 17, 2021
The open-source database for the realtime web.

RethinkDB What is RethinkDB? Open-source database for building realtime web applications NoSQL database that stores schemaless JSON documents Distribu

RethinkDB 24.7k Jun 17, 2021
🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

ArangoDB Slack: ArangoDB is a scalable open-source multi-model database natively supporting graph, document and search. All supported data models & ac

ArangoDB 11.3k Jun 6, 2021
The versioned, forkable, syncable database

Use Cases | Setup | Status | Documentation | Contact Welcome Noms is a decentralized database philosophically descendant from the Git version control

Attic Labs 7.5k Jun 6, 2021