# A MONOLITHIC 1.25GBITS/SEC CMOS CLOCK/DATA RECOVERY CIRCUIT FOR FIBRE CHANNEL TRANSCEIVER L.Wu, H.Chen, S. Nagavarapu, R. Geiger, E. Lee, W. Black Department of Electrical and Computer Engineering Iowa State University Ames, IA 50011, USA #### **ABSTRACT** This paper describes a monolithic CMOS clock/data recovery PLL circuit for 1.25Gbits/sec fibre channel transceiver. Features include a fully differential high speed phase detector, high speed charge pump, and a 4-stage ring oscillator which is optimized to have enough tuning range to cover all process corners as well as temperature variation from $0^{\circ}\text{C}{\sim}100^{\circ}\text{C}$ . The circuit was designed in 0.35 $\mu$ m single-poly, triple metal CMOS process. Simulations indicate that the core circuit along with the output buffer consumes 220mW from a single 3.3V supply in which 60mW is dissipated by the core circuit itself. ### 1. INTRODUCTION In recent years, the speed of serial data communication has been enhanced to Gbits/sec range by taking advantage of high-speed media such as fibre channel or fast Ethernet (1000baseT). To solve the synchronization problem between data and clock, the most economic way is to embed clock information into the transmitted data stream. However, this approach requires a clock recovery circuit at the receiver end to recover the clock information and to resynchronize it with the incoming data. The difficulty of clock recovery is that it needs to recover a periodical clock signal from non-periodical random data which usually does not have the clock component at its spectrum. Moreover, when operating at GHz range, it becomes extremely difficult. Traditionally, GaAs and bipolar technologies are preferred in such high-speed applications. However, the recent emergence of mature sub-micron CMOS technology make it possible to expand the application of CMOS technology into GHz domain with its inherent advantage of low cost and low power. The primary motivation for this design is to use sub-micron CMOS technology to implement a 1.25Gbits/sec fibre channel transceiver with Bit-Error-Rate(BER) < 10<sup>-12</sup>. The speed and BER requirement pose a big challenge on CMOS technology, especially the CMOS clock recovery circuit. Consequently, the major task of this design was to explore high-speed CMOS topologies and techniques for clock recovery. In section 2, the architecture, as well as special features of this design will be introduced. Each building block will be discussed in section 3. Simulation results will be shown in section 4. And finally, conclusions will be presented in section 5. #### 2. SYSTEM ARCHITECTURE In a 1.25Gbits/sec fibre channel serial data communication system, the function of the transceiver is as follows: at the transmitter side, 10bits of data is accepted in parallel at a 125MHz rate, which is then serialized and transmitted to the fibre channel as a pair of differential Pseudo-ECL(PECL) level signal at a baud rate of 10 times of 125MHz (i.e. 1.25GHz). At the receiver end, the clock recovery circuit should extract clock information from the incoming 1.25Gbits/sec data stream and resample the incoming data at 1.25GHz with a Bit-Error-Rate of < 10<sup>-12</sup>. Then the retimed serial bit stream is deserialized and is converted down to 125Mbits/sec for each of the 10 parallel output lines. The major performance specification here is Bit-Error-Rate. Its deterioration is due to all kinds of jitter, which introduces sampling time uncertainty and may cause the sampler to make a wrong decision. Therefore, a robust clock/data recovery circuit is the key part in the overall transceiver. Fig. 1 shows the block diagram of designed clock recovery circuit. It is basically a Phase-Locked-Loop which is composed of: - (1) Phase detector: detecting the phase difference between the data and the recovered clock; - (2) Charge pump and loop filter: transferring high frequency phase difference into low frequency control signal Vctrl to VCO; - (3) Voltage-Controlled-Oscillator:generating clock, that is aligned to the incoming data. Recovered clock from VCO is used to resample the incoming data and act as receiver system clock. Fig.1 Clock recovery system architecture What makes a clock recovery PLL more difficult than a PLL for other applications, is that normally a PLL tracks a *periodic* input clock with a *periodic* output, while for clock recovery PLL it should extract a *periodic* clock signal from *non-periodic random* data. Whenever there is no data, there should be no frequency drifting. Moreover, the data being transmitted is 8B/10B NRZ data. The spectrum of NRZ signal does not have a frequency component at the bit rate that is exactly at the frequency of the clock signal to be recovered. This clock component needs to be created by the circuit described here. Much effort was undertaken here to develop high performance CMOS PLL building blocks. A fully differential high-speed phase detector was designed especially for NRZ clock recovery. A 4-stage ring oscillator that has a tuning range sufficient to cover process corners was designed and a high-speed charge pump with on chip loop filter was developed. ## 3. CLOCK RECOVERY PLL BUILDING BLOCKS # 3.1 Fully Differential Phase Detector for NRZ clock/data recovery The structure of the phase detector determines the function of PLL as a clock recovery circuit. As mentioned before, the clock frequency needs to be created from NRZ data. Fig. 2(a) shows the phase detector used here<sup>[1][2]</sup>. It consists of two D flip-flops and two Gilbert multipliers. The incoming data is first sampled by CLK to get D1 which is aligned to CLK. D1 is further delayed by 1/2bit time to get D2. The "Up" signal is produced by D×D1 while "Down" signal is D1×D2. Since D2 is always 1/2bit delayed value of D1, "Down" pulse width remains constant. This is a reference. The pulse width of "Up" varies at the "early", "late" or "lock" situation which is shown in Fig. 2(b). Note that whenever in "lock", the frequency of "Up" and "Down" signals are 2 times the highest data transition frequency, which is exactly at 2 × 625MHz = 1.25GHz. This creates the required clock component. If there is no transition in incoming data, there is no output pulse to the charge pump, hence there is no change in the VCO control voltage and VCO frequency is maintained. This is especially important for clock recovery. (b) Timing Fig. 2 Fully differential phase detector Both the D flip-flops and the multipliers use fully differential structures which turn out to be appropriate for this application: (1) By working at reduced voltage swing, it has the potential of operating at very high frequency; (2) reduced current spikes on the power supply, creates less power-supply induced jitter; (3) Ability to accept PECL level differential input data RX+ and RX- directly, no interface circuit needed; (4) The D flip-flop which produces D1 also implements the resample function between incoming data and its clock. Thus the sampler in Fig.1 is already combined into this phase detector itself. D1 itself is already the resynchronized data. No additional sampler circuit is needed. #### 3.2 Charge Pump and Loop Filter The function of the charge pump and loop filter is to transfer the high frequency "Up" and "Down" signals into the VCO low frequency control signal Vctrl. Fig. 3 Charge pump and loop filter When there are no "Up" and "Down" signals, there is no current output to the loop filter. When "Up" is high, Ma absorbs less current and there is more current flow in M3, and consequently M4. This will result in net current output to the loop filter that will increase the frequency of the VCO. Similarly, when "Down" is high this decreases the VCO frequency. By working at appropriate voltage swings, this structure avoids the fully on/off of Ma and Mb, thus it can work at high speed. ### 3.3 Voltage-Controlled-Oscillator(VCO) A ring oscillator structure was chosen because it can be easily integrated. The core circuit is a 4-stage ring oscillator. At the input there is a bias generator which transfer Vctrl into two bias voltages to adjust the VCO frequency. At the output, there needs a buffer since the recovered clock will be used as system clock of the whole receiver. Thus it needs to have significant driving capability. Fig.4 VCO block diagram #### (1) Ring oscillator delay cell A fully differential delay cell is shown Fig. 5(a)<sup>[3]</sup>. Fig. 5 VCO delay cell The frequency can be changed by adjusting BIASP and BIASN simultaneously. By doing small signal pole-zero analysis, the VCO frequency can be expressed in terms of Vbiasn and Vbiasp as follows: $$\omega = \frac{\sqrt{2}}{2Cp} \mu_p C_{ox} \sqrt{2(\frac{W}{L})_4 [(\frac{W}{L})_1 V_{effn}^2 - \frac{1}{2} (\frac{W}{L})_5 V_{effp}^2]}$$ (1) where: ω is VCO frequency Cp is total parasitic capacitance at output node $V_{effn}=V_{biasn}-V_{tn}$ , $V_{effp}=V_{dd}-V_{biasp}-|V_{tp}|$ . Moreover, the delay stage must satisfy the following condition in order to have oscillation, i.e. $$\frac{\sqrt{2}}{2} \frac{g_{m1}}{g_{m4}} > 1 \quad \frac{1}{2C} \frac{g_{m1}}{\omega} > 1 \tag{2}$$ From (1) it can be seen that the bigger the Vbiasn the higher the frequency, while the smaller the Vbiasp the lower the frequency. At the same time (2) shows the smaller the frequency, the easier it can oscillate. Hence (1) and (2) conflict with each other. Therefore careful design must be undertaken to optimize the overall behavior. By setting BIASN and BIASP appropriately, both the tuning range of VCO and power dissipation can be optimized. #### (2) Bias generator There are two bias variables that can be controlled, but there is only one output voltage from loop filter Vctrl. So there needs to be an additional bias generator to generate BIASN and BIASP from Vctrl. This circuit is shown in Fig. 6, which is basically a level shifter. A differential pair continuously compares Vctrl with a reference voltage to generate BIASN and BIASP. A start up circuit needs to be added to ensure that the current converges to a non-zero state. Fig. 6 VCO bias generator VCO performance is summarized in table I. It can be seen that this VCO has very big tuning range to cover all process corners and temperature variation from $0^{\circ}$ C ~ $100^{\circ}$ C. TABLE I. VCO Performance | | FAST CORNER,<br>T=0°C | NORMAL<br>T=65°C | SLOW CORNER<br>T=100°C | |-------------------|-----------------------|------------------|------------------------| | Tuning<br>Range | 100M~2.2G | 60M~1.9G | 50M~1.7G | | Power | 240mW | 210mW | 190mW | | Rise/Fall<br>time | 100ps | 120ps | 150ps | # 3.4 Loop Dynamics The whole loop dynamics can be expressed in loop bandwidth $\omega_n$ and damping factor Kvco as follows: Loop bandwidth: $$\omega_n = \sqrt{\frac{I_{cp}}{2\pi} \cdot \frac{1}{C_p} \cdot K_{VCO}}$$ Loop damping factor: $$\xi = \frac{R}{2} \sqrt{\frac{I_{cp}C_p}{2\pi} \cdot K_{vco}}$$ For our design, Icp= $15\mu A$ , Cp=20pF, R=5K, Kvco=1.6GHz/V, the above two parameters are: $$\omega_n = 2 MHz$$ $$\xi = 0.7$$ #### 4. SIMULATION RESULTS This PLL clock/data recovery circuit has been fabricated along with the whole transceiver in $0.35\mu m$ single-poly, triple-metal CMOS technology. Simulation results show that this structure works well for 1.25GHz clock/data recovery with all process corners and temperature varies from $0^{\circ}C\sim100^{\circ}C.$ For illustration, worst case simulation results are shown in Fig.7. Worst case means that both PMOS and NMOS transistors are at slow corner and temperature is at 100°C with power supply down to 3V. Fig. 7(a) shows the waveforms of the incoming PECL data and recovered clock. It can be seen that when in lock, the rising edge of the recovered clock is aligned to the center of incoming data, which turns out to be the best sampling time. Fig. 7(b) shows the input PECL level data and the resampled data D1 and D2 from the phase detector. Fig. 7(c) shows the recovered clock spectrum. For fast corner, T=0°C, and Vdd=3.5V and normal condition, situations are better and similar results were achieved. (c) Recovered clock spectrum Fig. 7 Worst case simulation results ### 5. CONCLUSION A fully monolithic PLL clock/data recovery circuit for 1.25Gbits/sec fibre channel transceiver was implemented with 0.35µm CMOS technology. High speed CMOS PLL building blocks designed include a fully differential phase detector, current mirror charge pump and 4-stage fully differential VCO. Simulation results show that the circuits function properly at 1.25GHz with power dissipation of 60mW for the clock recovery circuit. #### 6. REFERENCES - [1] Dan H. Wolaver, *Phase-Locked Loop Circuit Design*, Prentice Hall, P.222, 1991 - [2] Lin Wu & William C Black, Jr., "A low jitter 1.25GHz CMOS analog PLL for clock recovery", Proceeding of ISCAS'98, June, 1998 - [3] John G. Maneatis, "Low jitter process independent DLL and PLL based on self-biased techniques", IEEE JSSC November, 1996