Table of contents
Steps
Calculate Expected Bucket Size:
- Divide the table size by the block size on Hadoop to get an initial estimate.
Expected Bucket Size = Table Size / Block Size on Hadoop
Find the Nearest Power of 2:
- Take the base-2 logarithm of the initial estimate to find the nearest power of 2.
n = log(Expected Bucket Size, 2)
Calculate Final Bucket Size:
- Use the rounded-off value to calculate the final bucket size by raising 2 to the power of the rounded value.
Bucket Size = 2 ^ Rounded Value
Example Calculation:
Assume:
Table Size = 2300 MB
Block Size on Hadoop = 128 MB
Calculate Expected Bucket Size:
Expected Bucket Size = 2300 / 128 = 17.96875
Find the Nearest Power of 2:
n = log(17.96875, 2) โ 4.167418
Round off to the Nearest Integer:
Rounded Value โ 5
Calculate Final Bucket Size:
Bucket Size = 2^5 = 32
Result:
Therefore, the bucket size for the given example is 32.
Reference:
ย