What is HBase?Posted by tib on July 23rd, 2019 Hbase is an open source and sorted map data built on Hadoop. It’s column oriented and horizontally scalable . It is based on Google's massive Table. It’s set of tables that keep data in key value format. Hbase is compatible for distributed data sets that are quite common in big data use cases. Hbase provides Apis enabling development in practically any programing language. It’s a section of the Hadoop system that gives random real-time read/write access to data within the Hadoop file system. Why Hbase?
For more details: Bigdata training in Bangalore Features of Hbase
HBase Read A read against HBase should be reconciled between the HFiles, MemStore & BLOCKCACHE. The BlockCache is designed to stay frequently accessed data from the HFiles in memory so as to avoid disk reads. Every column family has its own BlockCache. BlockCache contains data in kind of 'block', as unit of data that HBase reads from disk in a very single pass. The HFile is physically set out as a sequence of blocks and an index over those blocks. This means reading a block from HBase needs only trying up that block's location in the index and retrieving it from disk. Block: it's the smallest indexed unit of data and is that the smallest unit of data that may be scan from disk. Default size 64KB. Scenario, once smaller block size is preferred: To perform random lookups. Having smaller blocks creates a larger index and thereby consumes additional memory. Scenario, once larger block size is preferred: To perform sequential scans frequently. This permits you to save lots of on memory as a result of larger blocks mean fewer index entries and therefore a smaller index. Reading a row from HBase needs initial checking the MemStore, and then the BlockCache; Finally, HFiles on disk are accessed. HBase Write When a write is created, by default, it goes into 2 places:
Clients do not interact directly with the underlying HFiles during writes, rather writes goes to WAL & MemStore in parallel. Each write to HBase needs confirmation from both the WAL and the MemStore. Hadoop training in Bangalore HBase MemStore
What happens, once the server hosting a MemStore that has not yet been flushed crashes? Every server in HBase cluster keeps a WAL to record changes as they happen. The WAL could be a file on the underlying file system. A write is not considered successful till the new WAL entry is successfully written, this guarantees durability. RDBMS vs HBase RDBMS and HBase differences are given below.
Like it? Share it!More by this author |