The Big Data Queries: HBase Interview Questions

What is the similarity between HBASE and RDBMS?
Both are ACID compliant and can run transactional applications.
What is HBase design based upon?
HBase design is based upon Google's BigTable
What is the most unique feature supported by HBase?
HBase supports versioning out-of-the-box
How is versioning implemented in HBase?
Versioning is implemented using the timestamp field.
What are the different types of compression algorithms supported by HBase?
Gzip and Lempel-Ziv-Oberhumer (LZO)
Which compression algorithm comes packaged with HBase?
Gzip
Why is LZO not packaged with HBase? How can it be included?
Due to licensing issues LZO is not packaged with HBase. It can be downloaded separately.
What is a region?
A region is a chunk of rows identified by starting key (inclusive) and ending key (exclusive)
How are rows kept in HBase?
Rows are kept sorted by row key
How are region to region server assignments managed?
ZooKeeper (a distributed coordination service) manages region assignment to region server.
What are the two special tables in HBase?
.META and .ROOT
What does .META table store?
It keeps track of all user tables and which region servers are responsible for serving the regions of those tables.
Does one table map to one region?
No. As the size of the table grows, more regions are created and spread across the entire cluster.
How are write operations performed in HBase?
HBase uses WAL (Write Ahead Log) before persisting to the disk
Is writing to WAL mandatory?
No writing to WAL is not mandatory. It is enabled by default.
How can you control WAL setting?
Writing to WAL can be changed by using setWriteToWAL() method
What is the advantage of disabling WAL?
Improves performance
What does Bloom filters help determine?
Bloom filters determine if a column exists for a given row key or if a row key exists at all.
Why are operations that alter column family characteristics expensive?
HBase creates a new column family with new specification and then copies all the data over from the old column family and then deletes it.
What are the three different running modes supported by HBase?
Standalone mode, Pseudo-distributed mode, Fully distributed mode

The Big Data Queries

Sunday, March 2, 2014

HBase Interview Questions

No comments:

Post a Comment