As a user it doesn't really matter to me if loading a tape takes 2 seconds or 5 seconds - either way it is nothing compared to the loading times back in the day. Still, I always want to improve my emulator if I can, and tape flash loading has not been as fast as it apparently can be, when you compare it to other emulators. Also, as in the case of the Bad Apple 2 demo, some tape files won't even load correctly without efficient flash loading. So I decided to have one more look at the flash loading routines and found that I had made things much more complicated than I had to, and that a simpler solution was much faster!
Previously, the emulator used the header information to flash load the following data block, but the header itself was never flash loaded. If speed load and edge detection is activated, loading a header is fast anyway, but there is still a noticeable delay. Also, this method required separate routines for loading program blocks, data blocks and headerless blocks. After analyzing the ROM routines (using this excellent site) and doing some experimenting I found that by intercepting the LD BYTES ROM routine the emulator could flash load every tape block, regardless of type, so everything could be handled by the same routine - and much faster.
Below is the new flash loading routine in a static class, which is called by the Controller class after each instruction, when tape loading is active. The actual flash load routine is very simple, and the bulk of the code is needed to handle different errors that can occur. In short, the process consists of the following steps:
namespace SP48 { /// <summary> /// Loads tape data directly into RAM, bypassing the ROM loading routines. /// </summary> public static class TapeFlashLoader { /// <summary> /// Checks if the program counter is at an entry point in the ROM tape loading routines /// and attempts to flash load the next tape block. /// </summary> /// <returns> /// An integer representing the index of the <see cref="TapeBlock"/> that was flash loaded. A return /// value of -1 means that flash loading could not be performed. /// </returns> public static int FlashLoad(Z80 z80, Memory memory, TapeManager tapeManager) { // The return value, representing the index of the TapeBlock which was flash loaded. int lastBlockIndex = -1; // The LD BYTES ROM routine is intercepted at an early stage just before the edge detection is started. // Check that the tape position is at the end of a block and that there is a following block to flash load. if (z80.PC == 0x056A && tapeManager.NextBlock != null && tapeManager.CurrentTapePosition > tapeManager.NextBlock.StartPosition - 10) { // The target address for the data is stored in IX. int dataTargetParameter = 256 * z80.I1 + z80.X; // The block length (number of bytes) is stored in DE. int dataLengthParameter = 256 * z80.D + z80.E; // The flag byte is stored in A'. int flagByte = z80.APrime; // Check for various errors: // Is there a mismatch between the block type and the flag byte in the A' register? if (tapeManager.NextBlock.BlockTypeNum != flagByte && dataLengthParameter > 0) { // Don't load the block, but reset all flags. z80.CarryFlag = 0; z80.SignFlag = 0; z80.ZeroFlag = 0; z80.HalfCarryFlag = 0; z80.Parity_OverflowFlag = 0; z80.SubstractFlag = 0; z80.F3Flag = 0; z80.F5Flag = 0; // The A register is updated by XOR:ing the flag byte with the block byte read from the file. z80.A = flagByte ^ tapeManager.NextBlock.BlockTypeNum; } else // Is the expected number of bytes larger than the actual length of the block? if (dataLengthParameter > tapeManager.NextBlock.BlockContent.Length) { // If the DE register indicates a too long data length, the loader will fail after // loading the block and it expects one more byte. memory.WriteDataBlock(tapeManager.NextBlock.BlockContent, dataTargetParameter); // When a new edge is not found, flags carry = 0 and zero = 1. z80.CarryFlag = 0; z80.ZeroFlag = 1; // The other flags are set by the last INC B (from 0xFF) at 0x05ED. z80.HalfCarryFlag = 1; z80.SignFlag = 0; z80.Parity_OverflowFlag = 0; z80.SubstractFlag = 0; z80.F3Flag = 0; z80.F5Flag = 0; // Check that we're not dealing with a data fragment (in which case IX and DE are intact). if (tapeManager.NextBlock.BlockContent.Length >= 2) { z80.I1 = (dataTargetParameter + tapeManager.NextBlock.BlockContent.Length + 1) / 256; z80.X = (dataTargetParameter + tapeManager.NextBlock.BlockContent.Length + 1) - 256 * z80.I1; z80.D = (dataLengthParameter - (tapeManager.NextBlock.BlockContent.Length + 1)) / 256; z80.E = (dataLengthParameter - (tapeManager.NextBlock.BlockContent.Length + 1)) - 256 * z80.D; } z80.A = 0; } else // Is the expected number of bytes smaller than the length of the block? if (dataLengthParameter < tapeManager.NextBlock.BlockContent.Length) { memory.WriteDataBlock(tapeManager.NextBlock.BlockContent, dataTargetParameter); // When calculating the checksum for the loaded data, there are two different cases, // either the block length parameter equals zero, in which case there is no parity // calculated and the checksum contains the flag byte. // Otherwise, the checksum is calculated in the usual way but only for the number // of bytes specified in the data length parameter + 1. int calculatedCheckSum; if (dataLengthParameter == 0) calculatedCheckSum = flagByte; else { calculatedCheckSum = flagByte; // The checksum is calculated by XOR:ing each byte of data with the flag byte. for (int i = 0; i < dataLengthParameter + 1; i++) calculatedCheckSum ^= tapeManager.NextBlock.BlockContent[i]; } // The flags are set by the CP 0x01 operation at 0x05E0, where A = the current checksum. int flagTest = calculatedCheckSum - 1; z80.CarryFlag = BitOps.GetBit(flagTest, 8); z80.SignFlag = Flags.SignFlag(flagTest); z80.ZeroFlag = Flags.ZeroFlag(flagTest); z80.HalfCarryFlag = Flags.HalfCarryFlagSub8(calculatedCheckSum, 1, z80.CarryFlag); z80.Parity_OverflowFlag = Flags.OverflowFlagSub8(calculatedCheckSum, 1, flagTest); z80.SubstractFlag = 1; z80.F3Flag = 0; z80.F5Flag = 0; // Update the A, IX and DE registers. z80.A = calculatedCheckSum; z80.I1 = (dataTargetParameter + dataLengthParameter) / 256; z80.X = (dataTargetParameter + dataLengthParameter) - 256 * z80.I1; z80.D = 0; z80.E = 0; } else // Is the expected number of bytes equal to zero? if (dataLengthParameter == 0) { // There is no parity check, so the checksum contains the block type value. int calculatedCheckSum = 0xFF; // The flags are set by the CP 0x01 operation at 0x05E0, where A = the current checksum. int flagTest = calculatedCheckSum - 1; z80.CarryFlag = BitOps.GetBit(flagTest, 8); z80.SignFlag = Flags.SignFlag(flagTest); z80.ZeroFlag = Flags.ZeroFlag(flagTest); z80.HalfCarryFlag = Flags.HalfCarryFlagSub8(calculatedCheckSum, 1, z80.CarryFlag); z80.Parity_OverflowFlag = Flags.OverflowFlagSub8(calculatedCheckSum, 1, flagTest); z80.SubstractFlag = 1; z80.F3Flag = 0; z80.F5Flag = 0; z80.A = calculatedCheckSum; } else // Flash load the block and update IX, DE and AF. { int lastBytePos = memory.WriteDataBlock(tapeManager.NextBlock.BlockContent, dataTargetParameter); int calculatedCheckSum = 0; // The flags are set by the CP 0x01 operation at 0x05E0, where A = the current checksum. int flagTest = calculatedCheckSum - 1; z80.CarryFlag = BitOps.GetBit(flagTest, 8); z80.SignFlag = Flags.SignFlag(flagTest); z80.ZeroFlag = Flags.ZeroFlag(flagTest); z80.HalfCarryFlag = Flags.HalfCarryFlagSub8(calculatedCheckSum, 1, z80.CarryFlag); z80.Parity_OverflowFlag = Flags.OverflowFlagSub8(calculatedCheckSum, 1, flagTest); z80.SubstractFlag = 1; z80.F3Flag = 0; z80.F5Flag = 0; z80.A = calculatedCheckSum; // Set IX to the same value as if the block had been loaded by the ROM routine. z80.I1 = lastBytePos / 256; z80.X = lastBytePos - 256 * z80.I1; // Set DE to 0. z80.D = 0; z80.E = 0; } // Keep track of the index of the last loaded tape block. This information // can be used to rewind the tape to the start position of the next block // after an auto pause. lastBlockIndex = tapeManager.NextBlock.Index; // Skip forward to the end of the block which was just flash loaded into RAM. tapeManager.GoToEndOfBlock(tapeManager.NextBlock.Index); // Skip to the end of the LD BYTES ROM routine (actually a RET, so it doesn't really matter which RET instruction we point to here). z80.PC = 0x05E2; } return lastBlockIndex; } } }
I'm sure there is still room for improvement here, but so far flash loading is much faster and more reliable than before!
0 Comments
After implementing the flash loader for Basic programs I noticed that some programs wouldn't load correctly. Specifically I had problems with programs consisting of a single line which somehow auto runs and pass control over to machine code embedded in the Basic code. An example is the demo "SONG IN LINES Part 4" (1990, Busysoft & Fuxoft) which looks like this when opened in Tapir:
1600 LIST : RANDOMIZE USR 0[at 0,0][inverse 0][over 0][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][back][comma][comma] S touto ROMkou niesom ochotny spolupracovat !!![comma][comma][comma]
I don't know how this code works, but it wouldn't run correctly after being flash loaded. I suspect that I somehow didn't update the system variables correctly, so I implemented another method for flash loading, where instead of handling the system variables in the emulator I let the ROM routine handle this as it normally would. I intercept the LD_PROG_1 routine at 0x08B3, where the header has been interpreted and everything is prepared for loading the Basic program data block. I flash load the data block and return to ROM at 0x0805, which is right after the LD_BYTES routine (this method keeps the stack intact). I also update the IX register pair to point to the end address of the loaded data block. In addition I set DE to zero (being a counter for the LD_BYTES routine). I actually added the IX and DE updates to the old routine which made it work much better but still not as good as this routine. Here is the resulting code from the Controller and TapeFlashLoader classes:
There are still some tape files that don't load correctly, and this is something I will investigate. However, I don't think the problem in those cases is the Basic loader.
Many games have custom loaders which don't require headers for the code blocks, while still using some of the ROM tape loading routines. I have noticed that these loaders often bypass the LOAD A DATA BLOCK ROM routine at 0x0802 which the Controller class monitors (see Flash loader - Part 1), so for headerless loaders I choose to monitor 0x059F instead, which is a bit into the routine, just before the data block is actually loaded. When there is no header, the target address for the data is fetched from the IX register instead. Here is the Controller class code for handling headerless blocks: Most of the time, the regular loading process takes over when flash loading fails, but sometimes a file won't load correctly when flash loading is enabled. To make it easy to control this I added checkboxes for enabling flash load as well as speed load in the tape player window.
Having implemented flash load for code blocks (see Flash loader - part 1) I went ahead and looked into how to flash load a basic program. This is a bit more complex, since there are a number of system variables that need to be set correctly for the program to work. Aside from that, the process is the same as for code blocks, i.e. the Controller checks if the LD_BLOCK ROM routine is reached while a tape is playing. If the currently loading block is a program header, it will be flash loaded by a method in the Tape Flash Loader class, which looks like this: The method loads the data block to RAM, starting at the address found in the system variable PROG. It returns the end position of the program in RAM, which is used by the calling method to update the IX register. I'm not sure if all system variables are handled correctly but this seems to work. The best source of information about the system variables and how a program is mapped in memory that I could find was the Spectrum 128 Rom bank 0 disassembly by Matthew Wilson and others (full credits in the file). Here is a list of the system variables that I have identified (partly through trial-and-error) as important: Normally, a Basic program is loaded by the "Load a data block" ROM routine (at 0x0802). This routine loads the header and then makes room for the Basic program before loading it into RAM.
The header gives the ROM routine information about the following:
The ROM routine sets the E_LINE variable to the first address after the end of the program space. The two bytes stored at this address needs to be preserved, so before E_LINE is updated, the bytes are copied from the previous address. Following the two bytes are a 0x0D byte and a 0x80 byte. Normally, the WORKSP, STKBOT and STKTOP variables seem to be set to the value of E_LINE + 2, so this is what I implemented (I tested this by breaking into the emulation at the end of the loading routine and checking the variables for a number of tape files). The VARS variable is set based on the header parameter and the PROG variable is preserved as it was before loading, as is the K_CUR variable. After the VARS area, a 0x80 byte is inserted. The ROM routine triggers autorun from the line number in the NEWPPC variable if it has a value < 32768. The autorun routine also needs a parameter for which statement number within the line it should start from. This is stored in NSPPC which must be set to 0. Update 2019-08-10: A few updates to the code and text descriptions were made. Later, this method was replaced by a more robust method which is described in this post. The nostalgic appeal of watching flickering border stripes while a Spectrum program is loading soon wears off, and almost every Spectrum emulator implements some technique to speed up the tape loading process (i.e. when loading TAP or TZX files). The most intuitive method of doing this is probably to just increase the emulation speed during tape loading, something I implemented in SoftSpectrum 48 early on. A more efficent but also more difficult method (at least if you want it to work on every type of tape file) is to bypass the translation of tape data to audio signals via the in port, and instead just load the data directly into RAM. This technique is sometimes called "flash load", and I will try to implement it to some degree in SoftSpectrum 48. The basic principle The basic principle behind flash load (as I understand it - which may be a simplification) is that you need to do do four things:
Of course, things become much more complicated when you have custom loaders which use only some or none of the ROM routines, but I will start with a simple solution where I just intercept the ROM routine for loading a code data block. Flash loading a code block The Controller class orchestrates everything that goes on in the emulator, so it will control the flash load function as well. To do this it needs to do the following things:
Below is the Controller code for handling Tape flash load as described above: Apart from the new Controller code and the new Tape flash loader class (which only acts as an intermediary between the Controller and the Memory class), the other important change is that the Tape item class has been extended with a data array which holds the data block content. Previously, the Tape item class only held information about tape blocks to be presented in the tape player window. A tape item was created for each header block (by the Tape manager when opening a TAP or TZX file). Now, a tape item is created not only for headers but also for data blocks. What happens when the Controller detects a code header being loaded is that it retrieves the data array from the tape item following the header and copies this it to RAM. Result This first effort works surprisingly well, although the fact that program blocks are loaded in "real time" and that the code block flash load is not triggered until after the tape leader pulses have been processed (that is when the LD_BLOCK routine is activated) means that it still takes quite a while to load a tape. Also, headerless code blocks are not handled, so this will probably be the next step.
ZX Spectrum Tape Files The ZX Spectrum saves programs and data to tape in the form of audio recordings consisting of square wave signals with different wavelengths and durations. Here are some definitions:
Tape data is encoded as two 855 T-state pulses for binary zero, and two 1,710 T-state pulses for binary one. A standard ZX Spectrum tape file consists of a header block and a data block. The header block being 19 bytes long (17 bytes plus a flag byte and a checksum byte) and contains the filename and the type of data block which follows (program, array, screen or bytes). For program data, the header can include a line number to start from for programs that shall execute automatically. To distinguish header blocks from data blocks, a sequence of leader pulses precedes each type of block. The leader pulse is 2,168 T-states long and is repeated 8,063 times for header blocks and 3,223 times for data blocks. After the leader pulses, two sync pulses (667 T-states plus 735 T-states long) follow to signal the beginning of the actual data. Read more about the tape file structure here. Custom tape routines can have other structures and encodings. Emulator Tape Files An emulator tape file contains all the necessary data for the emulator to recreate the audio pulses and input them to the emulator’s audio in port. Some emulators can bypass tape loading routines and instead convert the tape file data directly into emulated RAM, but this is not implemented in SoftSpectrum 48. The two most common tape file formats are:
SoftSpectrum 48 can load TAP files and most TZX files. Process The process for loading tape files in SoftSpectrum 48 is as follows:
Speed Loading SoftSpectrum 48 uses a simple technique to increase the speed of the tape loading process. Whenever the CPU runs the instruction at 0x05C8, which is in the beginning of the ROM tape loading routines, CPU speed is increased to 400 %. Then, when the end of the tape file has been reached, speed is returned to its previous value.
|
Archives
November 2020
Categories
All
|