Ok, this page is about the USB (Universal Serail Bus) and what it takes to make it talk to a PIC micro controller.
Before we continue, I want to tell you, what I was and maybe, you are thinking in the moment.
If you want to talk to something over the USB, there are two main scenarios - a PIC to PC communication, and the more thrilling one is a PIC to USB Device(dongle) communication. We won't talk about the first case, because it is trivial and there are planty of options, like PIC18F2550 or USB2TTL adapter. The more interesting is the case where you have a nice and CHEAP USB dongle, and you want to talk to it. Now we come to the thinking... Well I have a CHEAP and nice little dongle, so why not using it in my projects? Let's take a Bluetooth dongle for example, you can find it for as little as $1,3 on eBay, now compare it to the price of a bluetooth to ttl adapter ($10 for the module + $9 for the adapter board) now you see it's not $1,3 and think there is a plot against you - they just want to ROB you. And of course you are partly right, they really want to rob you, but to use the dongle with a PIC is not that simple, you can bet. The reason is the so called USB Host Controller.
Copyright 2010-2011 Assen Nikolov and GEMLIT. All rights reserved.
Redistribution and use in source or binary forms, or incorporated into a physical (hardware) product, with or without modification, are permitted for non-commercial use only, provided that the following conditions are met:
ALL THE INFORMATION, TECHNOLOGY, AND SOFTWARE IS PROVIDED BY THE AUTHORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL ASSEN NIKOLOV, GEMLIT OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE OR TECHNOLOGY, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
There are three types of usb devices - a host, hubs and functions. The USB is not a distributed network, it is strictly centrallized and the center is the HOST. It operates all the requests, and controlles all the data comming in or out. Two devices connected to a USB network can't just talk to each other, if it is necessary they talk using the host, as a middleman. And here we come to the reason, why communication between your dongle and a PIC is kind of a pain in the ass. You just need the Host Controller, but you don't have one.
One way of solving the problem with the missing USB Host Controller is to buy one, a single chip USB Host Controller for about $10-12 is reasonable and really the way if you are not short of resources ($), but if you have come to this page, this is probably not the case. The other way is to emulate a host controller in software, and this is what we are going to attemp. Just to have it in mind, there are already such projects - the USBTiny for ATTiny MCU, but as far as I know it is only USB LOW SPEED (1,5Mbps) and we are going to try to build a USB2.0 FULL SPEED (12Mbps).
Ok, now some more technical stuff... the USB FULL SPEED has a data rate of 12Mbps, that is quite fast for a micro. I'm using the dsPIC30F/33F family, that has a max speed of 30-40MIPS, now imagine you have to communicate at 12Mbps. Even if you have your data ready bit stuffed and NRZI encoded, you will need something like 6 cycles per bit (assuming you are putting 1 bit at a time)
_write: BTSC DATA,#bit ; test your data GOTO #_set BCLR PORTx,#bitY ; send 0 GOTO _write _set: BSET PORTx,#bitY ; send 1 GOTO _write (this is not the brightest algorithm, but it will surve us well to illustrate the high speed demands of the bus)at 36MIPS (3 PIC cycles every 1 USB bit, 36/12=3) you are already 3 cycles short. Of course, this is not an optimized routine, but if you think of using just 2 I/O pins on the PIC to communicate over the D+/- lines of the USB, I'm going to disappoint you - I don't think it is really possible, at least not at 12Mbps.
So, this was for warming up. Now as you can see, if we want to do it the cheap way, we'll need to sacriface some pins and simplicity. I suggest to use a 8bit I/O bus, one pin for a clock and some more pins for other things.
Now is the time to say some words about the USB phisical layer. The USB uses a NRZI encoding, that means it has two complementary lines the D+ and D-, there is no common line, nor a clock line. The signals over the D+/- lines are about +/-2V, and the USB datasheets have tons of requirements about, what is a propper USB interface, so I suggest you using a cheap but ready made USB transceiver (like USB1T11A or USB1T20A from Fairchild Semicondictors.) this will save you a lot of headakes and simplify the design, as it will handle for you some of the low level protocol requirements.
After you have chosen, let's say, USB1T11A the things that have left to be done by yourself, are the encoding/decoding of the NRZI signal, the bit stuffing and the CRC. I'm working on a hardware implementation of this steps so that we don't put extra burdon on our MCU. (update: After some time playing with the 7400 family logic, I deceided to use a CPLD, because of the complicated schematics, and layout). The CPLD of choise is the Xilinx XC9572XL device. The smallest one XC9536XL won't do the job as it is too small.
We will begin with the receiving scenario.
On the RCV port of the USB1T11A you will receive your data, as a logic 0/1, which is great. Now we have to decode the NRZI first. The NRZI signal is generated from the RAW data like this
RAW DATA - 1|0|0|1|0|1|1|0| NRZI DATA - 0 1 0 0 1 1 1 0 0's in the RAW DATA, triger level switch in the NRZI signal, on every clock tick, and 1's mean no change
As RAW data may contains long strings of 1's, this means no change in NRZI signal, thus no sync mechanism over potential long periods of time, for that reaon USB uses bit stuffing, which is inserting "0" after each 6 consecutive 1's like so
RAW DATA - 01011111110
STUFFED - 010111111010
to decode the RCV signal you must first decode the NRZI, then detect and discard the stuffed 0's and then do or do not the CRC. The destuffed data is what you send to your MCU.
After reading again through the protocol papper, I stopped at the CRC fields. There are two CRC in the USB protocol, the first is 5-bits, protecting the Address+Endpoint number and the frame number in the SOF packet (these fields are 11-bits wide). And the socond type of CRC is 16-bits, protecting the data in the DATA0/1 packet (we are not going to discus the high-speed).
I made some tests and my most optimized code, performed the CRC5 operation for about 110 instruction cycles (dsPIC33F), which is beyond and rational limits, and imagine what will cost me the CRC16 of the DATA, that could be as long as 8192 bits. So I decided to do it in the interface logic.
For information on the USB specific CRC refer to 8.3.5 of the USB2.0 specification.
Here comes a example:
00001000111 o----> | 11111 | 0 111110 00101 11011 0 110110 00101 10011 | 0 | 100110 -------o
100110 o----> 00101 00011 | 0 | 000110 1 101100 00101 01001 0 | 010010 | 0 | 100100 -------o
100100 00101 00001 0 000010 1 100100 00101 00001 1 100010 00101 00111 1 101110 00101 01011
Ok, now some explanations.
The data to calculate the check-sum for is 11 bit = 00001000111. The most left bit is going first in the circuit.
The hold/shift register is filled with (1)s. The generator polynomial is 00101B (in orange).
First we take the current data bit(most left) and XOR it to the higher of the shift register. Next we shift the shift register one step left, and insert (0) in the lowest bit. If the result of the XOR was (=1) we XOR the shift register with the polynomial, if (=0) we do nothing. Now go to step one.
This is the technique shown in the example, the result is in red (01011). We should then invert it to get the CRC5.
INPUTS
OUTPUTS
You'll need one 8-bits bus to your MCU and 3 additional pins for control signals. The LE pin of the CPLD might be connected directrly to the IRQ pin, however you should guaranteed that you have written the new data to the IO BUS(0:7). Something very important - the FINISHED input to the CPLD. FINISHED should be set only after IRQ and only when in DATA packet, and no more DATA to transmit, as the DATA packet has no length identifier, something should alarm the CPLD that no more data is available. On IRQ you should read/write data from/to the IO BUS. Reading and writing should be done in the same bit timeframe (~ 83ns from the moment you get the IRQ=1, but it is better to do it almost immediately). When you finished the data for a DATA packet, you should set the IO BUS to all 1's and set the FINISHED. Also for TOKEN packets you should send all 0's CRC field, as the interface is generating its own CRC5.
First the USB packet structure
TOKEN PACKET ------------------------------------------------ SYNC PID ADDR CRC5 EOP 00000001 PPPPpppp AAAAAAAAAAA CCCCC xxJ DATA PACKET ------------------------------------------------ SYNC PID ADDR DATA CRC16 EOP 00000001 PPPPpppp AAAAAAAAAAA (BYTEs) CCCCCCCC CCCCCCCC xxJ
The packet identifier (PID) filed is the first after the SYNC field, which is 8-bits wide. The PID field is also 8-bits wide, and is transmitted LSb first. For simplicity we will skip the integrity check of the PID field and use only the first two bits of it.
PID[1:0] - xx01B = TOKEN while xx11B=DATA (LSb received/transmitted first).
OK, as you may have already noticed, the DATA that should be CRC checked starts at a fixed position in the packet. This is exactly after the PID field has finished. And a packet always starts with a J to K transition (1 to 0 for usb full-speed).
The PID detector detects the SOP(Start of packet) and this enables the free runing counter. The DATA freely flows through the two most right flip-flops and we suppress their clock when the first two bits of the PID field are in, they stay there until the next initialization of the detector. After the counter, counts to 16 and the PID type requires CRC, we enable the CRC_ENABLE signal (it is Active low). CRC_TYPE is defacto the PID's second bit. We also use the free running counter to generate the IRQ, on every 8th bit of the DATA.
Infact this module do not really do the de-stuffing, but it counts the consecutive 1's and after the sixth it suppresses the clock for one cycle. As we use this clock to control the I/O shift register this is enough to stuff/de-stuff the data. All modules after the de-NRZI(for incoming data) and before NRZI (for outgoing data) use this clock - CLK12. The impulse that suppresses the original clock is cc0 (Active low).
This one is simple. The input is from the USB interface chip - Vm and Vp. When both of this are (=0) this is the so called SE0 condition on the USB line. The special pattern is used in the USB 2.0 full-speed to mark the EOP(end of packet) and this is [SE0][SE0][J]. Our circuit will detect [SE0][SE0][J] as well as [SE0][SE0][K] patterns, but as the later is not a valid patter we do not expect this to happen. The pulse of the EOP is one bit long. The last flip-flop in the chain is to align the pulse to the deNRZI data, as it is atleast 1/2 period (42ns) behind the RCV respectively - Vp/Vm.
The NRZI decoder, works in two "MODES". When MODE=0, it is in receiveing or decoding mode. It aligns and decodes the input data. The second mode only delays the incoming data with 1/2 period (42ns), as the data do not need decoding and is already aligned. We pass the data from MODE=1, through the NRZI decoder, so that the decoded data and the data from the MCU match phase. This is important as the PID detector do not take MODE into account.
This is the actual interface between the STUFFED and NRZI encoded data and the USB Interface (USB1T11A). Here one thing that you should notice is the EOP_START (Active high) signal. When high, the module sends the EOP pattern - [SE0][SE0][J].
Ok, this is really simple. We have to data sources. One is the actual data from the MCU and the second is the CRC_OUT from the CRC GENERATOR. For outgoing data (MCU -> USB) we do not generate the CRC in MCU, but we do it in the interface. The c27, FINISHED and PACKET_TYPE signals control the output data. The idea is to replace the original CRC field from the MCU (that should be 0's) with the CRC_OUT output.
c27 controls when to insert the CRC_OUT for the TOKEN packet type. While FINISHED controls the DATA packet type.
The first flip-flop does the bit stuffing. The actual bit (0) insertation happens on cc0 pulse (active low), so ANDing the incoming data with cc0 results in (0) only when we should stuff the bit. The NRZI encoder is controlled from the stuffed data. It changes its output state only when the controlling data is (0). We use SOP to set the default value of the NRZI signal (=1) that is [J] in usb 2.0 full-speed mode.
This is a 5-bits general purpose counter unit. It generates two signals the c27 and the EOP_START. c27 is used to switch the CRC generator in "CRC_OUT" mode. The EOP_START is used in the transmitter module. The counter is initialized asynchronusly by the SOP and the rising edge of the FINISHED. There are three conditions that triggers EOP_START
Ok, this is one of the most important modules. Even if you think you can go without the CRC for the received data, assuming everything is ok, you can't realy go without the correct CRC field for the transmitted data. So if you can do your CRC in MCU, then you can probably skip this module and make your life easier, however I've already did it hardware, so I think you should take the advantage.
So, this module generates CRC5 and CRC16 values for the data passing through it.
The CRC mode is selectable via the CRC_TYPE signal (0 = CRC5; 1= CRC16).
INIT is tied to the CRC_ENABLE from the PID DETECTOR. CRC generator is in initial state when CRC_ENABLE=1, and begins to shift on CRC_ENABLE=0.
When in receiving mode the STOP_SE0 is used to halt the CRC generation.
CLK_SELECT is used to switch from CRC_GENERATION to CRC_OUT mode
CRC_OK is valid after the packet is received from the MCU, and is there until the next packet is received.
CRC_OUT is aligned to the input data, so replacing the original CRC field is not a problem.
For the algorithm of the CRC look before the schematics section of this page. Now some words about the realisation. First is that I use a 24MHz clock for the circuit, and this is to simplify the CRC generator's control signals (SEL, CE). What we do is to delay the SELect signal (blue) with 1/2 24MHz period. The SEL signal selects whether to do a XOR or SHIFT the CRC registers. As you have seen in the algorithm two actions are repeated forever - 1-SHIFT 2-(XOR or nothing) 1-SHIFT 2-(XOR or nothing).... and so on , and so on. When SEL=1 we perform SHIFT, and whem SEL=0 we do XOR. The XOR itself is nothing more that an inversion of the bits that are marked with (1) in the CRC polinomial, so XOR with 00101B, just means invert bits at position (0) and position (2). As we said when SEL=0 we do a XOR, but for the bits that do not need to change, we just take their output, feed it to the input, but we do not invert it. When "nothing" rather than XOR is required we just do not invert the input for all of the bits in the shift register. This is controlled with the CE, that is a result from the DATA XOR bit5/17, depending on the CRC type selected. The SEL is important to be delayed 1/2 24MHz period, so that the rising edge of the clock is not aligned with the SEL's phase.
OK, this is very simple 8-bit shift register with synchronous loading. When LE (load enable) is high the data si taken from the D(0:7), while when LE=0 data is shifted one step, and the lowest bit is set to SLI.
If you look at the master scheme you'll see the tristate buffer, that helps you use the shift register as bidirectional port. Also the LE infact should be connected directly to IRQ, but this is not possible with the XC9572XL device, since this connection alters the equations and they no longer fits in the chip.
Well this is nothing new, but a SMD implementation of the XILINX's Parallel Cable III Schematic.
The board is designed for simplicity. It is SMD, there are only two wires on the top side and a ground plane. You can do it as a single sided, but take care for the grounds.
If you preffer a double sided through-hole design, here in this post, there are schematics and board layouts you can use.
It turns out that I have a centronic parallel cable (MALE DB-25 to MALE CENTRONIC 36pin) so I had to modify the layout a little. Here is the new layout, it is optimized for one sided PCB, this is the bottom side, as seen from top to bottom, so the mask for printing should be mirrored.
Here are the layouts for the PCB SIDE 2 and the SOLDER MASK pattern. They are for the NEGATIVE UV DRY FILM method, if you're using the TONNER TRANSFER-HOT IRON method you'll need this mask inverted. The file is 600dpi, be careful when printing it.
Nickname | |
Comment | |