Transform-Based Lossless Image Compression Algorithm for Electron Beam Direct Write Lithography Systems

Conventional photolithography systems use physical masks which are expensive and difficult to create and cannot be used forever. Electron Beam Direct Write (EBDW) lithography systems are a noteworthy alternative which do not need physical masks [Chokshi et al. (1999)]. As shown in Figure 1 they rely on an array of lithography writers to directly write a mask image on a photo-resist coated wafer using electron beams. EBDW systems are attractive for a few reasons: First, their flexibility is advantageous in processes requiring the rapid prototyping of chips. Second, they are known to reduce fabrication costs [Lin (2009)]. Third, they are well suited for Next-Generation Lithography (NGL) because they are able to produce circuits with smaller features than state-of-the-art photolithography systems. Finally, since the mask images are electronically controlled EBDW systems could be improved by software. Our focus here will be on this last point. EBDW is not at this time used in many circuit fabrication processes because it is much slower than physical mask lithography systems. One current focus of research to address the throughput problem is massively-parallel electron beam lithography. Some of the research groups/companies which are developing such systems include KLA-Tencor [Petric et al. (2009)], IMS [Klein et al. (2009)], and MAPPER [Wieland et al. (2009)]. Chokshi et al. (1999) proposed a maskless lithography system using a bank of 80,000 lithography writers running in parallel at 24 MHz. Dai & Zakhor (2006) pointed out that this lithography system can achieve the conventional photolithography throughput of one wafer layer per minute, but layout image data is often several hundred terabits per wafer and therefore data delivery becomes an important issue. Dai & Zakhor (2006) proposed using a data delivery system with a lossless image compression component which is illustrated in Figure 2. They hold compressed layout images in storage disks and transmit the compressed data to the processor memory board. This kind of EBDW lithography system can achieve higher throughput if the decoder embedded within the lithography writer can sufficiently rapidly recover the original images from the compressed files. Dai (2008) discussed two constraints on this type of system: 1) the compression ratio should be at least (Transfer rate of Decoder to Writer / Transfer rate of Memory to Decoder), and 2) the decoding algorithm has to be simple enough to be implemented as a small add-on within the maskless lithography writer. Therefore the decoder must operate with little memory. Transform-Based Lossless Image Compression Algorithm for Electron Beam Direct Write Lithography Systems


Introduction
Conventional photolithography systems use physical masks which are expensive and difficult to create and cannot be used forever.Electron Beam Direct Write (EBDW) lithography systems are a noteworthy alternative which do not need physical masks [Chokshi et al. (1999)].As shown in Figure 1 they rely on an array of lithography writers to directly write a mask image on a photo-resist coated wafer using electron beams.EBDW systems are attractive for a few reasons: First, their flexibility is advantageous in processes requiring the rapid prototyping of chips.Second, they are known to reduce fabrication costs [Lin (2009)].Third, they are well suited for Next-Generation Lithography (NGL) because they are able to produce circuits with smaller features than state-of-the-art photolithography systems.Finally, since the mask images are electronically controlled EBDW systems could be improved by software.Our focus here will be on this last point.EBDW is not at this time used in many circuit fabrication processes because it is much slower than physical mask lithography systems.One current focus of research to address the throughput problem is massively-parallel electron beam lithography.Some of the research groups/companies which are developing such systems include KLA-Tencor [Petric et al. (2009)], IMS [Klein et al. (2009)], and MAPPER [Wieland et al. (2009)].Chokshi et al. (1999) proposed a maskless lithography system using a bank of 80,000 lithography writers running in parallel at 24 MHz.Dai & Zakhor (2006) pointed out that this lithography system can achieve the conventional photolithography throughput of one wafer layer per minute, but layout image data is often several hundred terabits per wafer and therefore data delivery becomes an important issue.Dai & Zakhor (2006) proposed using a data delivery system with a lossless image compression component which is illustrated in Figure 2.They hold compressed layout images in storage disks and transmit the compressed data to the processor memory board.This kind of EBDW lithography system can achieve higher throughput if the decoder embedded within the lithography writer can sufficiently rapidly recover the original images from the compressed files.Dai (2008) discussed two constraints on this type of system: 1) the compression ratio should be at least (Transfer rate of Decoder to Writer / Transfer rate of Memory to Decoder), and 2) the decoding algorithm has to be simple enough to be implemented as a small add-on within the maskless lithography writer.Therefore the decoder must operate with little memory.Their experimental results indicate that their approach is often more efficient than the context prediction method used in C4 and Block C4.
In this paper we extend the work of Yang & Savari (2011) to gray-level images to better address the issue of handling proximity correction for EBDW systems and show that we obtain better compression performance and faster encoding/decoding than C4 and Block C4.H e n c e our work can be used to solve the data delivery problem of EBDW lithography systems with smaller features.Moreover, since our decoding speed is faster than C4 and Block C4 we can improve the throughput of the EBDW lithography system.

Overview
Layout image data is commonly cached in GDSII [Rubin (1987)] or OASIS [Chen et al. (2004)] formats.GDSII and OASIS describe circuit features such as polygons and lines by their corner points [see Rubin (1987) and Reich et al. (2003)].GDSII and OASIS formatted data are far more compact than the uncompressed image of a circuit layer.Therefore GDSII and OASIS initially seem to be well-suited for this application, but the problem is that EBDW writers operate directly on pixel bit streams and GDSII and OASIS layout representations must therefore be converted into layout images before the process begins.The conversion process involves 1) placing the circuit features such as polygons and lines into the correct layers of the circuit, and 3) rasterizing (see Figure 3).This conversion process often lasts hours or even days using a complex computer system with large memory and cannot be executed by the decoder chip.
The final rasterizing step consists of two parts: a) it produces a binary image on a finer grid, and b) the binary image is processed in blocks to generate a gray-level image.In the second step the input binary string is partitioned into m × m pixel blocks.For each block the number of filled pixels are computed and normalized/quantized to the corresponding gray level.
When this gray-level image is transmitted to the EBDW lithography system the lithography writer interprets the gray level (or pixel intensity) as an exposure dose which is controlled by exposing the corresponding region multiple times with an electron beam.Through this process the printed layout pattern becomes more robust to the electron beam proximity effect making better quality circuits.Our approach is motivated by the compactness of the GDSII/OASIS format and uses corner representation.However, we bypass the complex flattening and rasterizing processes and instead work with a simple decoding process.Yang & Savari (2011) considered some of these ideas for binary images which handle the proximity correction by rasterizing the input binary image on a fine enough grid.Here we will extend these ideas to gray-level images on a coarser grid.
Figure 4 summarizes the components of the compression algorithm.We begin by applying a corner transformation to the image like the one in Corner2 [Yang & Savari (2011)].However, unlike Corner2 this transformation outputs two streams: a "corner stream" and an "intensity stream".The corner stream is a binary stream which locates the polygon corners1 and the intensity stream is a stream of pixel (corner/edge) intensities.Each stream is input to a separate entropy coding scheme which outputs a compressed bit stream.The corner stream is compressed using a combination of run length encoding [Golomb (1966)], end-of-block coding, and arithmetic coding [Moffat et al. (1998)].The intensity stream is compressed using end-of-block coding and then compressed by LZ77 [Ziv & Lempel (1977)] and Huffman coding [Huffman (1952)].
In Section 2.2 we will first describe the corner transform process which outputs the corner stream and the intensity stream.In Section 2.3 we will describe the final entropy coding process of the corner stream, and in Section 2.4 we will describe how the intensity value is compressed.

Corner transformation
The GDSII/OASIS representation of a structurally flattened single layer describes the layout polygons by their corner points.This representation requires large decoder memory since the decoder needs to access a memory block of size (|x 1 − x 2 | + 1) × (|y 1 − y 2 | + 1) for the encoder to connect an arbitrary pair of points (x 1 , y 1 ) and (x 2 , y 2 ) as in Figure 5. Therefore this representation is infeasible for our application.However, the rasterizing process becomes much less complex if the angle of a contour line is constrained to a small set.Yang & Savari (2010) took advantage of horizontal and vertical contour lines and decomposed an arbitrary polygon into a collection of Manhattan polygons, i.e., polygons with right angle corners.This approach is effective because most components of circuit layouts are produced using CAD tools which design the circuit in a rectilinear space, and the non-Manhattan parts can also be described by Manhattan components.In this framework, the decoder scans the image in raster order, i.e., each row in order from left to right.When the decoder processes a corner it must determine whether it should reconstruct a horizontal and/or a vertical line.Observe that a corner is either the beginning of a line going to the right and/or down or the end of a line.Yang & Savari (2010)  Yang & Savari (2011) more recently observed that a row (or a column) of the original binary layout image consists of alternating runs of 1s (fill) and runs of 0s (empty).Therefore it is more efficient to encode pixels where there are transitions from 0 to 1 (or 1 to 0) using symbol "1" and to encode the other places using symbol "0." Observe, as in Figure 6(b), that after applying this encoding in the horizontal direction to a collection of Manhattan polygons that the output consists of alternating runs of 1s and 0s in the vertical direction.To increase the compression we repeat this encoding in the other direction to obtain the final corner image.
In the binary corner transformation the final encoded image is binary and the "1"-pixels give information about the corners of the polygons.To describe the algorithm we begin with a two-step transformation process and then shorten it to a one-step procedure which requires less memory during the encoding process and is faster than the two-step transformation process.
The two-step transformation process begins with a horizontal encoding step in which we process each row from left to right.For each row, the encoder sets the (imaginary) pixel value to the left of the leftmost pixel to 0 (not filled).If the value of the current pixel differs from the preceding one we represent it with a "1" and otherwise with a "0."The second step inputs the intermediate encoded result to the vertical encoding process in which each column is processed from top to bottom.In the specification of the algorithms end for 11: end for Based on the experimental success in Yang & Savari (2010) and Yang & Savari (2011) for binary layout images it is natural to expect that a combination of the corner transformation for the outline of gray-level polygons and a separate representation for the intensity stream would outperform Block C4.Not et h a tnLv-level gray images for this application have pixel intensity 0 (empty) outside the polygon outline, nLv − 1 (fully filled) inside the polygon outline, and an element of (0,nLv) along the polygon outline.Therefore we need only consider intensities along polygon corners and edges.Finally, in order to obtain the polygon outline using the corner transformation, we first have to map the gray-level image to a binary image.This is 100 Recent Advances in Nanofabrication Techniques and Applications www.intechopen.comeasily done by mapping all of the nonzero intensities to 1 (fill) and leaving the zero intensities (not fill) unchanged.

Entropy coding -corner stream
The corner stream typically contains long runs of zeroes and is therefore well-suited to compression algorithms like run length encoding [Golomb (1966)] and end-of-block (EOB) coding.Because the corner transformed image is a sparse binary image, if read in raster order (as we read) the string would consists of ones and runs of zeroes.During the compression process, the transitional corners (ones) of the transformed image are written unchanged, but each run of zeroes is described by its run length via an M-ary representation which we next describe.Define the new symbols "2", "3", •••, "M+1" to respectively represent the base-M symbols "0 M ", "1 M ", ••• ,"(M − 1) M ".For example, if the transformed stream was "1 00000 00000 1 00000 0000 1 00000 00000 000" and M = 3, then the encoding of the stream is "1 323 1 322 1 333" because the run length are 10 (=101 3 ), 9 (=100 3 ), and 13 (=111 3 ), and 2/3/4 to respectively represent 0 3 /1 3 /2 3 .We find that the addition of EOB coding helps represent the corner stream more efficiently.When the polygons are aligned and start/end at the same rows of the image the resulting runs of zeroes could be longer than a multiple of the row width.Although this could be handled by choosing M sufficiently large the memory requirements for the encoding and decoding of the final M-ary representation via arithmetic coding [Moffat et al. (1998)] for further compression requires a choice of M as small as possible in our restricted decoder memory setting.We observe that it is effective to divide each line into k blocks of length L, and we define a new EOB symbol "X".If a run of zeroes appears at the end of a block we represent that run using an end-of-block symbol X instead of an M-ary representation.Hence the encoding for a line of zeroes is k X's instead of approximately log M (kL) symbols.For the previous example, if M=2, k = 5, and L = 7, then the transformed stream "1000000 0000100 0000000 1000000 0000000" is described as "1X 3221X X 1X X," where 2/3 (=0 2 /1 2 ) is used for the binary representations of runs of zeroes.We find that EOB coding results in long runs of "X"s and it is useful to employ an N-ary run length encoding to these runs.For the previous example, if M = N = 2, k = 5, and L = 7, then the next description of the string is "1 4 3221 5 1 5," where 2/3 (or 4/5) handles the binary representation of runs of zeroes (or "X"s).Finally, we compress the preceding stream using the version of arithmetic coding offered by Witten et al. (1987), and the decoder in this case requires four bytes per alphabet symbol.Since we used M + N + 1symbols2 ,4(M + N + 1) bytes were used for arithmetic decoding.

Entropy coding -intensity stream
The corner stream contains no intensity information.Since we are applying row-by-row decompression (from left to right), the intensity values have to be given in that order.The intensity values that we require are for corner pixels and pixels on the edges.As we have mentioned earlier in Section 2.2, the pixels outside the polygons will have 0 intensity (empty) and pixels inside the polygon boundaries will have nLv − 1 intensity (fully filled).
To obtain better prediction we could apply linear prediction along the neighboring pixels as is done in Block C4.However, this approach requires the full information of the previous row which translates to decoder memory.Therefore we instead apply EOB encoding to the pixels corresponding to horizontal/vertical edges because the pixel intensity along an edge seldom changes unless oblique lines are used.We encode the intensity stream as in Algorithm 3. Note that in the algorithm ρ is the length of the intensity stream which is determined at the end of the encoding process.

Algorithm 3 Intensity Stream Encoding
end for 15: x = i.end for 31: end for If the current pixel corresponds to a corner (Lines 4-5), the intensity is represented as is.If the current pixel corresponds to a horizontal edge pixel (Lines 6-16) which starts from the left pixel, check the run of that intensity.If the horizontal edge pixel has constant pixel intensity throughout the entire edge, represent the intensity value followed by an end symbol ǫ and skip to the ending corner pixel (Lines 7-10).Otherwise, write the entire edge intensity as is (Lines 11-15).Similarly, if the current pixel corresponds to a vertical edge (Lines 17-29) which starts from the upper pixel determine whether or not the pixel intensities are fixed throughout the vertical edge.If they are constant then represent the intensity value followed by the end symbol ǫ (Lines 18-20) and reset the intensity values for the following rows (Lines 21-23) so that they are not processed in Lines 27-29.Otherwise, write the intensity value as is and proceed (Lines 24-26).Finally, the remaining vertical edge pixel intensities are written in Lines 27-29.After the entire intensity stream has been processed, compress the output stream using LZ77 and Huffman coding.The LZ77 algorithm by Ziv & Lempel (1977) compresses the stream by finding matches from the previously processed data.When a pattern is repeated within the search region, it could be encoded using a short codeword.Huffman coding is used at the end of LZ77 to represent the LZ77 stream more efficiently.The combination of LZ77 and Huffman coding is widely used in a number of compression algorithms such as gzip.W eu s e dzlib [zlib (2010)] to implement it.The compression rates depend on the size of the LZ77 search region and the dictionary for the Huffman code.Because of the decoder memory restrictions we chose an encoder needing only 2,048 bytes of memory for the dictionary.2,048 bytes is slightly less than the memory used to describe an entire row of our benchmark circuit.However, since we were applying this only to the intensity stream we were able to match more rows than Block C4.

Decoder
The decoder consists of an intensity stream decoder and a corner stream decoder as in Figure 8.The intensity stream decoder is actually an entropy decoder which can be decomposed into a Huffman decoder and an LZ77 decoder.The corner stream decoder consists of an entropy decoder which consists of an arithmetic decoder, a run length decoder, an end-of-block decoder, and a corner transform decoder which reconstructs the polygons from the entropy decoder output.The corner transform decoder utilizes the output of corner stream entropy decoder to reconstruct the polygon outlines and uses the output of the intensity stream decoder to reconstruct the polygon pixel intensity.The entire process works on a row-by-row fashion.Since each part of the decoding procedure (arithmetic decoding, run length decoding, end-of-block decoding, inverse corner transformation, LZ77 decoding, and Huffman decoding) is simple and works with restricted decoder memory, the entire decoder can be implemented in hardware.Note that the most complex part will be the arithmetic decoder which is widely implemented in microcircuits [Peon et al. (1997)], and the other parts are comprised of simple branch, copy, and computation operations as we will see in the following subsection.

Intensity stream decoder
Decompressing the intensity stream is straightforward.We apply LZ77 and Huffman decoding to obtain the ǫ-coded intensity stream.As we have mentioned in Section 2.4, the decoder requires 2,048 bytes of memory to decode the LZ77 and Huffman codes.The ǫ-coded intensity stream is passed on to the corner transform decoder for the final reconstruction.Note that the decoder does not decompress the entire compressed intensity stream at once but rather decompresses some number of ǫ-coded intensity symbols at the request of the corner transform decoder.The detailed decompression of the ǫ-coded intensity stream will be discussed at the end of the next subsection.

Corner stream decoder -corner transform decoder
As we have mentioned earlier, the corner stream decoder consists of an entropy decoder and a corner transform decoder.The entropy decoder reverses the procedure of the entropy encoder of Section 2.  Finally, the output of the entropy decoder is a binary corner image.In this section we will focus on the operation of the corner transform decoder and why it can run in a row-by-row fashion.This feature makes our approach well-suited to the restricted memory available to an EBDW writer.In our corner transform decoder we use a row buffer BUFF to hold the status of the previous (decoded) row.It stores a binary representation of the status of each pixel and therefore consumes width bits of memory."0" denotes 'no transition' while "1" denotes the 'transition' which delineates the starting/ending point of a vertical line.Moreover, since we need the polygon boundaries -corners and horizontal/vertical edge pixels -we use another row buffer CNR to hold the boundary status of the previous row.This also requires width bits of memory.Since the algorithm is long, we have split its description into two parts, namely Algorithms 4 and 5.The input to the algorithm is the corner image, and the algorithm outputs the binary layer image and the corner map which shows whether a pixel in the binary layer image is outside all polygons (O), inside a polygon boundary (I), a corner (C), a horizontal edge pixel (H), or a vertical edge pixel (V) which will be used to reconstruct the pixel intensity along with the intensity stream decoder.The first part of the algorithm, illustrated in Algorithm 4, shows how the buffers are used to pass previous row information to the current row so that the decoding process could be applied in a row-by-row fashion.Lines 5-7 process the binary image buffer BUFF.I ft h e buffer is filled then the corresponding pixel is part of a vertical edge and it is filled.Lines 8-15 process the corner map buffer CNR.If the corresponding CNR pixel does not form a run, then the corresponding pixel above it was a vertical edge pixel (Line 10).The starting and ending points of the runs of 1s are interpreted as the corners (Line 12) and the other pixels in between are translated as horizontal edge pixels (Line 14).
Lithography / Book 2 layout image, but Block C4 [Liu et al. (2007)] has a memory shortage/failure.We therefore divided the image in a way to enable the successful application of Block C4.Our code for CornerGray and Block C4 is in C/C++ for the former and in C# for the latter.The experiments were conducted on a laptop computer having a 2GHz Intel Core i7 CPU and 4GB RAM.The decoder memory requirement of CornerGray (with parameters M = N = 64) was width × (2 + bits per pixel)/8 + 4(M + N + 1)+2, 048 = 8, 514 bytes while that of Block C4 was width × bits per pixel × 0.25 + 427 = 8, 927.We require 5% less decoder memory than Block C4.Tables 1 and 2  The input image size was 6, 800 × 7, 128 (=30,294,000 bytes) and the CornerGray parameters were N = M = 64 for all layers.The compression ratio in Table 1 and Table 2 is defined as Input File Size Compressed File Size .
However, the results show that CornerGray is relatively weak for handling massively repeated patterns.Among the 13 layers, CornerGray did not perform well (compared to Block C4)f o r Layer2, Layer4, Layer8, and Layer10.These layers contained patterns which consist of a large array and Layer8 and Layer10 in particular had complex patterns which were scattered.For these parts the complex LZ-based copying part of Block C4 resulted in better performance.Hence, more sophisticated pattern matching is required to improve CornerGray.

Conclusion
In the previous section we saw that the algorithm CornerGray outperforms Block C4 and is considerably faster.The improvement in CornerGray over Block C4 is a result of different techniques.Our corner location approach is simpler than the context prediction used by Block C4 to handle the irregular parts of layer images.However, CornerGray needs a better pattern handling scheme for circuits which contain massively repeated patterns.We are currently trying to generalize the frequent pattern replacement component of Corner2 [Yang & Savari (2011)] in order to handle frequent patterns within binary layout images and expect similar compression improvements for gray level images.The decoding operations for CornerGray include common decompression schemes which are widely implemented in hardware as well as simple branches, compares, and memory copies for the corner transformation part.Therefore our decoder can be deployed using hardware and is an approach to the data delivery problem of maskless lithography systems.
The second author was supported in part from NSF Grant CCF-1017303.
3. It first reconstructs the run length and end-of-block encoded stream using the 103 Transform-Based Lossless Image Compression Algorithm for Electron Beam Direct Write Lithography Systems www.intechopen.com

Fig. 8 .
Fig. 8. Decoder Overview: Note that the decompressed corner stream is input into the image reconstructor in a row-by-row fashion and the intensity stream is actually ǫ-coded as in Section 2.4.arithmetic decoder.Then, depending on the symbol, runs of zeroes (symbols 0 M , ••• , (M − 1) M ), runs of EOBs (symbols 0 N , ••• , (N − 1) N ), or the corners (symbol 1) are reconstructed.Finally, the output of the entropy decoder is a binary corner image.In this section we will focus on the operation of the corner transform decoder and why it can run in a row-by-row fashion.This feature makes our approach well-suited to the restricted memory available to an EBDW writer.In our corner transform decoder we use a row buffer BUFF to hold the status of the previous (decoded) row.It stores a binary representation of the status of each pixel and therefore consumes width bits of memory."0" denotes 'no transition' while "1" denotes the 'transition' which delineates the starting/ending point of a vertical line.Moreover, since we need the polygon boundaries -corners and horizontal/vertical edge pixels -we use another row buffer CNR to hold the boundary status of the previous row.This also requires width bits of memory.Since the algorithm is long, we have split its description into two parts, namely Algorithms 4 and 5.The input to the algorithm is the corner image, and the algorithm outputs the binary layer image and the corner map which shows whether a pixel in the binary layer image is outside all polygons (O), inside a polygon boundary (I), a corner (C), a horizontal edge pixel (H), or a vertical edge pixel (V) which will be used to reconstruct the pixel intensity along with the intensity stream decoder.The first part of the algorithm, illustrated in Algorithm 4, shows how the buffers are used to pass previous row information to the current row so that the decoding process could be applied in a row-by-row fashion.Lines 5-7 process the binary image buffer BUFF.I ft h e buffer is filled then the corresponding pixel is part of a vertical edge and it is filled.Lines 8-15 process the corner map buffer CNR.If the corresponding CNR pixel does not form a run, then the corresponding pixel above it was a vertical edge pixel (Line 10).The starting and ending points of the runs of 1s are interpreted as the corners (Line 12) and the other pixels in between are translated as horizontal edge pixels (Line 14).