[Milkymist-devel] The dungeons of NORia: Meeting the Balrog
elecktron at hotmail.com
Fri Oct 28 18:07:12 PDT 2011
Nice job capturing all this!
What about a multi-voltage supervisor for 1V2, 2V5 and 3V3 rails, such as the STM6179  rather than relying on an unregulated 5V from a wallwart.
And perhaps should you use a 12V wallwart and and board 5V switching (pre-) regulator? This allows for looser constraints on the wallwart, then there's a backwards compatibility issue?
> Date: Fri, 28 Oct 2011 21:39:44 -0300
> From: werner at almesberger.net
> To: devel at lists.milkymist.org
> Subject: [Milkymist-devel] The dungeons of NORia: Meeting the Balrog
> The exploration of the dungeons of NORia has finally led to a
> meeting with the supposed arch-enemy: the power-down behaviour of
> the reset circuit.
> M1rc3 has a special reset chip (U24, ) that resets FPGA and NOR
> when powering up and that also holds them in reset when the 3.3 V
> rail drops below 2.63 V. The expectation was that this would
> prevent the NOR corruption. Alas, it didn't.
> After poking around for a while, we started to suspect that, when
> powering down, the 3.3 V rail may drop more slowly than some of
> the other rails - particularly any of the power rails supplying
> the FPGA core.
> In this case, the FPGA could get confused, send out weird signals,
> which would then be properly amplified by the FPGA's I/O drivers
> (operating at 3.3 V), received by the NOR (also operating at
> 3.3 V), and finally every once in a while producing a valid
> command the NOR may still have enough time to process before it
> also loses power.
> Power rails can drop at different speeds because each has its own
> regulator and output buffering. It's not trivial to assure that
> rails come up or down in a specific order and it's also difficult
> to measure the exact order, because it can vary a lot with what
> the system is doing at the time of the power cut.
> However, we know that no power rail can drop faster than the power
> input. Because if a rail would drop faster, the regulator could
> simply draw more power from the input to bring the rail back up
> Thus the idea was born to drive the reset chip not from the
> regulated 3.3 V rail but from the filtered but unregulated 5 V
> input. Also, to make sure we cut out in time, the threshold
> voltage of the reset chip should be closer to 5 V.
> The rework
> I removed the old reset chip and replaced it with an
> APX803-44SAG-7  which has a threshold voltage of 4.38 V. To
> isolate the input pin from the 3.3 V pad on the PCB, I placed a
> piece of single-sided 0.36 mm FR4 board  between chip and pad.
> The closest 5 V source I could find is C125, part of the MIDI TX
> This is what it looks like:
> M1 behaviour after rework
> Immediately after the rework, the M1 behaved a little odd. It did
> reset and enter standby, but when I tried to get into the BIOS to
> run the CRC test, it just stopped (maybe a spurious reset).
> I'm not sure what happened there. Later, I checked the voltages,
> and they're all good: 4.98 V at the DC jack and 4.94 V at U24 pin
> Eventually, it gave in and behaved properly. I then proceeded to
> run the usual power-cycling loop.
> I ran the power-cycling test for 4284 cycles. It did not report a
> single corruption.
> Afterwards, I did a CRC check, which also showed that everything
> was in good health (*). Last but not least, I dumped the lock bits
> and verified that block 0 was indeed unlocked.
> This means that the test seems to be valid. If we assume a
> previous corruption probability of 1/500 per cycle, the
> probability of passing 4284 cycles without hitting a single
> corruption would be about 0.02%.
> (*) In case you're checking my log : the rescue BIOS failed the
> CRC check. I think it's the MAC address that causes the CRC to
> fail. I never bothered to fix this, so that failure is normal
> and expected.
> It seems that changing the reset circuit such that it always
> resets FPGA and NOR when power is ramping down does reduce the
> rate of NOR corruptions substantially and may even eliminate the
> problem entirely.
> The instabilities observed immediately after the rework need
> further examination. They may have been caused by residues of the
> rework (e.g., flux that hasn't dried completely), but another
> possible explanation would be short voltage drops on the 5 V rail
> during load changes.
> We may also consider using a reset chip with a lower threshold
> voltage. E.g., the APX803-40SAG-7 with a nominal threshold of
> 4.0 V should still give the 3.3 V regulator  enough room to do
> its work, while being less sensitive to small upsets of the 5 V
> What's next
> I'll play with my M1 in "regular use" for a bit and watch for
> unexplained resets/hangs/etc.
> After that, a longer test run should provide more certainty that
> the corruption is really gone. The probability for that increases
> roughly exponentially with the number of cycles, and each 5-6
> hours add a factor of ten. So a couple of days should be
> Last but not least, this needs testing with the supply voltage at
> its limits, e.g., the 4.75 V to 5.25 V allowed for a USB host.
>  http://www.ait-ic.com/uploads//2009-10/21/_1256089836_7ol2c.pdf
>  http://www.diodes.com/datasheets/APX803.pdf
>  http://search.digikey.com/us/en/products/PC94/PC94-ND/354417
>  http://downloads.qi-hardware.com/people/werner/m1/nor/d8/raw.tar.bz2
>  http://www.national.com/profile/snip.cgi/openDS=LP38690
> - Werner
> IRC: #milkymist at Freenode
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Devel