Q1: When I run your program the output is all messed up.
Q2: The amount of RAM reported is wrong.
Q3: Did you code all of these yourself?
Q4: What resources you use to find out about all the different architectures?
Q5: Why?
Q6: What are the most unusual opcodes you've come across?
Q7: How do you port to a new architecture?
Q8: This project has *horrible* methodology!
Q9: Why not just write it in C and compare the generated binaries?

--------------------


Q1:  When I run your program the output is all messed up.

A1:	That's more of a statement than a question, but OK.

	On some architectures, most notably x86, and most recently with 
	core and core2 processors, you will get ugly results.  
	Because ll is optimized for size, it reports blindly what's in the 
	/proc/cpuinfo file.  If intel or your BIOS manufacturer put really 
	ugly and pointlessly long info there, there's not much I can do 
	about it.

	If you want pretty output, use linux_logo instead.


Q2: The amount of RAM reported is wrong.

A2:	See the previous question.  

	I report what the sysinfo syscall reports.

	This doesn't report ACPI, shared video ram, and other things that 
	might make you appear to have less RAM.  

	For a more accurate tool use linux_logo.


Q3: Did you code all of these yourself?

A3:	Originally I only had an x86 version that I posted to my website,
	and Stephan Walter found the page and sent me a more optimized 
	version.  We had a little contest going on for a while, seeing
	who could shave off the most bytes from the x86 code.  He's the 
	one who had the great idea of using LZSS compression.  
	Since in the end all of the other architecture's code are loosely 
	based on the x86 code, a lot is owed to him.

	Overall though, most of the code was done by me, by hand.

Q4: What resources you use to find out about all the different architectures?

A4:	The architecture manuals provided by the manufacturer are the best 
	resource there is, as the companies involved want you to use their 
	processor.
    
	Knowing what instructions does what doesn't help though if you don't
	know how to interface with the Linux kernel.  The best reference for 
	that is the web.  

	Bizzarrely the best code examples you'll find are by shell-coders.  
	These are hackers who try to explot buffer overflow flaws in 
	programs to get a root shell on your machine (hence the 
	shell-coder moniker).  While their motives might be questionable, 
	they are the experts at using Linux at the syscall level without
	a C library and their source examples are an invaluable resource.
    
	There are Other useful tools.  

	strace is great at helping you see if you are calling the syscalls 
	with the proper arguments.

	objdump (specifically the --disassemble-all option) will show you 
	exactly how much space each instruction is taking in the final 
	exceutable.  

	The linux kernel source is useful, primarily for the asm/unistd.h
	header which has all the syscall info for an arch
	Overall the linx kernel source doesn't help much.  Neither
	does the glibc code, nor does using gcc -S.


Q5: Why?

A5:	Assembly language is the ultimate puzzle.   You have this black
	box (a computer) that only accepts a series of coded numbers.
	Different numbers make it do specified things.  Using only this
	limited amount of numbered codes, make the machine do the
	task at hand.  Extra Credit:  make it as small as possible
	(or as fast as possible).  Fun!
    
	More practically, my work as a computer architect has me often
	working at a low level with different architectures.  It is nice
	having a "rosetta stone" set of programs that all do more or less
	the same thing, but on different platforms.  Just knowing how
	to run an exit() syscall on a platform can make programming it
	in assembly that much easier.
    
    
Q6: What are the weirdest opcodes you've come across?    

A6:    abcd  - m68k
       aaa   - x86
       addme - ppc
       eieio - ppc
       pea   - m68k
       bra   - m68k,sh3,microblaze
       bras  - s390
       sex   - such an opcode exists, but not in any platforms linux supports.
               Most processors have a "Sign EXtend" instruction but most
	       companies were too afraid to use that opcode (for example,
	       it's cdq on x86)
       brb   - vax
       spanc - vax
       shad  - sh3
       lfsux - ppc
       stfsux - ppc
       sob   - pdp-11
       doze/nap/sleep/rvwinkle - power6
       blei  - microblaze
       brad,braid,bald - microblaze
       

Q7: How do you port to a new architecture?

A7: First step is to find a machine from that architecture capable of
    running Linux.  For the first few easy ports this was trivial.
    It got harder and harder to find machines.  Eventually
    for some of the more obscure I had to use simulators (vax, m68k)
    and for sh3 I ended up using a user-space only qemu.
    
    Second step is getting development tools.  This is pretty easy
    if you are running a full distribution on actual hardware.
    It's a bit harder in the obscure cases; you have to build
    cross-compile tools and that can be tricky.
    
    Third step is to find architectural documentation.  The biggest
    help is the manual the company provides that describes the
    architecture and all available opcodes.  That gets you most
    of the way.  There can still be some tricky aspects.  Finding
    out how to do syscalls properly can be a hurdle (best help
    is usually the arch-specific unistd.h file that comes with linux.
    Next is the uclibc code.  Worst case you might have to statically
    link a binary and dig through a disassembly of the code with
    objdump).  Another problem can be differences in the assembly
    gas (the gnu assembler) expects.  The gas info page is a big
    help here.
    
    Fourth step is to start coding.  Usually I start by getting
    the "exit()" syscall working, as that's one of the easier
    syscalls and you pass along an exit value that's easy
    to check with "echo $?".  Once that works, it's time
    to do a "hello world" which exercises the write() syscall
    and memory accesses.  After this gradually build up to
    the system info lines, then to the centering part (division
    can be tricky).  Last is usually the actual logo decompression.
    
    The fifth step is debugging, which can be difficult.  The 
    best tool to help is "strace" which shows which syscalls
    are happening and which values.  gdb (if it is working)
    also can be a big help.  Some of the more obscure
    architectures you can't have either strace of gdb and
    have to resort to scattering write() and exit() calls
    around, plus a lot of commenting out.
    
    The sixth and final step is optimizing.  There is no
    quick guide to this, you just have to find the parts
    of the code that are excessively big and try to make
    them smaller.  Using objdump to diassemble the code
    before and after is a good tactic, especially on
    architectures with variable-sized instructions.
    
Q8: This project has *horrible* methodology!

A8: Yes, it didn't start as a code-density study.  It started as me
	trying to port a program I liked to assembly language, just for fun.

	To be perfectly fair you would have to control for:

	1. Exectuable overhead.
		ELF Linux executables have more overehad than DOS COM
		64-bit ELF headers tend to be bigger than 32-bit

	2. The code pretty printing.
		x86 for example detects number of processors
		Other architectures don't.
		Going back and fixing all to match would be a pain.

	Even then it's a long list of rules you'd have to enforce.

	And once you have the rules, people will start over-optimizing
		the code to take advantage of the rules.


Q9: Why not just write it in C and compare the generated binaries?

A9: That's not testing the raw code density of your architecture,
	that's just testing the compilers.

	You still have to account for executable differences.
	You might have architectures w/o a C compiler.
	The C compiler is likely not optimizing for size, even with -Os