Understanding TunnelBufferedWriter vs TunnelRecordWriter in MaxCompute

3 minute read

Understanding TunnelBufferedWriter vs TunnelRecordWriter in MaxCompute

When uploading data via the MaxCompute Tunnel SDK, there are two primary writer classes available:

TunnelBufferedWriter
TunnelRecordWriter

While both are used to write records into a table, they differ significantly in how they manage block IDs and ensure thread safety. This post explores the differences and implications of choosing one over the other.

🧱 What Is a “Block”?

In the context of the Tunnel SDK, a block is a unit of data written to MaxCompute in one session. Instead of writing row-by-row in real-time, data is accumulated and pushed in blocks for performance and consistency.

🧠 Core Difference: Block ID Management

openRecordWriter(blockId) manually opens a block writer for the given ID.
TunnelBufferedWriter automatically manages block IDs behind the scenes.

`TunnelRecordWriter`

You must manually assign a blockId when writing.

Example:

TunnelRecordWriter writer = session.openRecordWriter(0L);
writer.write(record);
writer.close();
session.commit(new Long[] {0L});

// TableTunnel.UploadSession.java
  private RecordWriter openRecordWriterInternal(long blockId, CompressOption compress, long blockVersion) throws TunnelException {
    TunnelRetryHandler retryHandler = new TunnelRetryHandler(this.conf);

    try {
        return (RecordWriter)retryHandler.executeWithRetry(() -> {
            Connection conn = null;

            IOException e;
            try {
                e = null;
                conn = this.getConnection(blockId, compress, blockVersion);
                TunnelRecordWriter writer = new TunnelRecordWriter(this.schema, conn, compress);
                writer.setTransform(this.shouldTransform);
                return writer;
            } catch (IOException var10) {
                e = var10;
                if (conn != null) {
                    try {
                        conn.disconnect();
                    } catch (IOException var9) {
                    }
                }

                throw e;
            }
        });
    } catch (RuntimeException var8) {
        RuntimeException re = var8;
        throw re;
    } catch (Exception var9) {
        Exception e = var9;
        throw new TunnelException(e.getMessage(), e);
    }
}

An upload session opens a new RecordWriter for a specific block ID provided manually.

`TunnelBufferedWriter`

Automatically assigns and manages blockIds via internal counters.

Uses a synchronized method like:

 //TunnelBufferedWriter.java
public void flush() throws IOException {
  this.checkStatus();
  long delta = this.bufferedPack.getTotalBytesWritten();
  if (delta > 0L) {
    Long blockId = this.session.getAvailBlockId();
    long version = 0L;
    if (this.versionProvider != null) {
      version = this.versionProvider.generateVersion(blockId);
    }

    if (this.versionProvider != null) {
      try {
        this.session.writeBlock(blockId, this.bufferedPack, this.timeout, version);
      } catch (TunnelException var7) {
        TunnelException e = var7;
        throw new IOException("Generate block version invalid", e);
      }
    } else {
      this.session.writeBlock(blockId, this.bufferedPack, this.timeout);
    }

    this.bufferedPack.reset();
    this.bytesWritten += delta;
  }

}

//TableTunnel.java
public synchronized Long getAvailBlockId() {
    if (this.curBlockId >= this.totalBLocks) {
        throw new RuntimeException("No more available blockId, already " + this.curBlockId);
    } else {
        Long old = this.curBlockId;
        this.curBlockId = this.curBlockId + this.shares;
        return old;
    }
}

This ensures that even in multithreaded environments, block IDs are allocated safely.

🔐 Thread Safety Considerations

With `TunnelRecordWriter`

You are in charge of assigning block IDs. This means:

In concurrent environments, collisions can happen unless you manually synchronize access.
It’s prone to human error when used across multiple threads or workers.

With `TunnelBufferedWriter`

The writer:

Internally manages block IDs
Guarantees thread-safe ID allocation using synchronized
Provides a more robust experience for parallelized or distributed writing tasks

✅ What Does session.commit() Actually Do?

With `TunnelRecordWriter`

You must explicitly commit the block(s) you’ve written:

session.commit(new Long[] {blockId});

Internally, the method verifies:

The blocks you provided match those uploaded to the server
Each provided block exists on the server
Fails if there’s a mismatch

With `TunnelBufferedWriter`

Since block IDs are internally managed and tracked:

You can simply call:

session.commit();

The SDK already knows which blocks were uploaded, and performs the finalization accordingly

While commit() may seem simpler, it’s best used when block IDs are managed by the SDK (e.g., via TunnelBufferedWriter). For more strict validation and manual block control (e.g., when using TunnelRecordWriter), prefer commit(blocks) to avoid silent mismatches.

📝 Takeaway

Both TunnelBufferedWriter and TunnelRecordWriter serve valid use cases, but understanding their differences in block management and thread safety is key to choosing the right tool.

If you’re building a scalable data pipeline or handling concurrent writes, prefer TunnelBufferedWriter. For single-threaded or fine-grained control, TunnelRecordWriter gives you more flexibility—at the cost of more responsibility.

Reference

docs
PR

Share on

Twitter Facebook LinkedIn

dybyte

Understanding TunnelBufferedWriter vs TunnelRecordWriter in MaxCompute

Understanding TunnelBufferedWriter vs TunnelRecordWriter in MaxCompute

🧱 What Is a “Block”?

🧠 Core Difference: Block ID Management

`TunnelRecordWriter`

`TunnelBufferedWriter`

🔐 Thread Safety Considerations

With `TunnelRecordWriter`

With `TunnelBufferedWriter`

✅ What Does session.commit() Actually Do?

With `TunnelRecordWriter`

With `TunnelBufferedWriter`

📝 Takeaway

Reference

Share on

You may also enjoy

Upsert Session: Handling New Checked Exceptions

메모리 최적화 측면에서 배열 vs 변수 방식

Multi-level Feedback Queue

filter & servlet

dybyte

Understanding TunnelBufferedWriter vs TunnelRecordWriter in MaxCompute

🧱 What Is a “Block”?

🧠 Core Difference: Block ID Management

TunnelRecordWriter

TunnelBufferedWriter

🔐 Thread Safety Considerations

With TunnelRecordWriter

With TunnelBufferedWriter

✅ What Does session.commit() Actually Do?

With TunnelRecordWriter

With TunnelBufferedWriter

📝 Takeaway

Reference

Share on

You may also enjoy

Upsert Session: Handling New Checked Exceptions

메모리 최적화 측면에서 배열 vs 변수 방식

Multi-level Feedback Queue

filter & servlet

`TunnelRecordWriter`

`TunnelBufferedWriter`

With `TunnelRecordWriter`

With `TunnelBufferedWriter`

With `TunnelRecordWriter`

With `TunnelBufferedWriter`