Fun with packing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Fun with packing

Judson, Ross
I observed that a significant portion of Scala's generated .class files
is signature information, necessary for compilation.  This information
is not necessary for runtime, though, and I figured I'd modify my
classload-packer to excise it, and see what happens.  Things get pretty
small, is what happens :)

I packed sbazgui with a small utility I've placed at
(http://www.soletta.com/scala/packload.jar).  Packload is an executable
JAR file (it can pack itself, but it's so small it only saves about 3k),
so just double-click it or do (java -jar packload.jar) to run its
minimal interface.

You provide it with your main class, the name of the jar you want to
WRITE to (will be erased, so be careful), and then the source jars you
want to compress.  For a Scala program you will usually supply at least
the scala-library.jar, and also a jar containing your own class files.
I used the utility to pack sbazgui, which has the following components:

scala-library.jar : 1,072,659 bytes
sbaz.jar : 317,856 bytes
sbazgui.jar: 142,920 bytes
--------------------------------
total: 1,533,435 bytes

Packed, self-executing sbazgui.jar: 174,085 bytes, containing all of the
above, and double-clickable.

If we use the JDK's pack200 utility to compress scala-library.jar:

pack200 --unknown-attribute=strip -G -O --modification-time=latest
scala-library.pack.gz scala-library.jar

scala-library.pack.gz: 105,110 bytes

If we execute:

unpack200 scala-library.pack.gz scala-library-small.jar

scala-library-small.jar: 837,704 bytes

So in a _runtime_ environment, we can pack the scala-library down to
around 100k if we don't intend to _compile_ against it.  I think this is
quite relevant to the applet case; the pack200 format is allowable for
J2SE 5 applets, and with the right pack200 command (including the
attribute stripping) one can create a downloadable archive that is quite
small.

I suspect that further optimization is possible if the stripped .class
files are then run through a bytecode obfuscation/optimization utility.

RJ

Reply | Threaded
Open this post in threaded view
|

RE: Fun with packing

Judson, Ross
Forgot one thing:

I was thinking that perhaps the Scala compiler might want to offer an
option to write the signature into a separate file.  The compiler's
class reader could look for an embedded signature first to conform to
existing behavior, then look for the separate file.  

SomeObject.class
SomeObject.sig

It brings a few things to mind.  First, it becomes relatively simple to
reduce the size of a run-only deployment by including only class files.
Second, the signature functions a bit like a header, and there may be
optimizations to be had by having a cache of signatures, rather than
having to pull in an entire .class file (nsc's class reader may already
have lazy instantiation like this -- haven't checked).  Third, if a
separated unpickler exists, it might be used to do quite a bit of
runtime typing.  Fourth, given that unpickler, it might be interesting
to extend the signatures to include scaladoc information, which could
then be used within development environments.  

RJ
Reply | Threaded
Open this post in threaded view
|

Re: Fun with packing

Niko Korhonen
In reply to this post by Judson, Ross
Judson, Ross wrote:
> I observed that a significant portion of Scala's generated .class files
> is signature information, necessary for compilation.  This information
> is not necessary for runtime, though, and I figured I'd modify my
> classload-packer to excise it, and see what happens.  Things get pretty
> small, is what happens :)

It would be incredibly cool if this kind of utility would get integrated
into Scala, or the jar files in the standard Scala distribution could be
made smaller with this method!

Having to lug around a 1-megabyte jar file in addition to JRE with each
application distribution is IMO a /huge/ problem. Making that megabyte
file into a hundred-or-so kilobyte file would make Scala a much more
viable software development platform.

The current situation IMO puts Scala in a bad position against the
competition. If we assume that everyone has a compatible JRE and .NET
Framework installed, we get these kind of figures for runtime dependencies:

Java with JVM: No additional libraries
Scala with JVM: 1 MB jar file
C# with .NET: No additional libraries
Boo with .NET: 70 kB DLL file

So if small application distribution packages are a priority, Scala
seems the least viable option. I'd like to see this change, especially
since I consider Scala as a language to be far superior to the competition.

--
Niko Korhonen
SW engineer

Reply | Threaded
Open this post in threaded view
|

Re: Fun with packing

Burak Emir
Niko Korhonen wrote:

> So if small application distribution packages are a priority, Scala
> seems the least viable option. I'd like to see this change, especially
> since I consider Scala as a language to be far superior to the
> competition.
>
if that's a priority, throwing away those parts of the library that you
don't need might give you even more savings on space than throwing away
the symbol table information, no?

I guess it would be possible to run standard Java tools that operate on
class files in order to find the dependencies... up to use of
reflection, of course.

cheers,
Burak

--
Burak Emir

http://lamp.epfl.ch/~emir

Reply | Threaded
Open this post in threaded view
|

packload.jar as sbaz package? Re: Fun with packing

Burak Emir
In reply to this post by Judson, Ross
Hi Ross,

good stuff! would you consider making an sbaz package of it?

don't worry about writing runner scripts or so, just a doc/README with
this email would be enough I assume.

There is excellent ant-support for building sbaz packages...

next question: given that you give the main class, does it follow
dependencies based on the bytecode? what happens if one uses reflection
to instantiate modules?
 
cheers,
Burak

Judson, Ross wrote:

>I observed that a significant portion of Scala's generated .class files
>is signature information, necessary for compilation.  This information
>is not necessary for runtime, though, and I figured I'd modify my
>classload-packer to excise it, and see what happens.  Things get pretty
>small, is what happens :)
>
>I packed sbazgui with a small utility I've placed at
>(http://www.soletta.com/scala/packload.jar).  Packload is an executable
>JAR file (it can pack itself, but it's so small it only saves about 3k),
>so just double-click it or do (java -jar packload.jar) to run its
>minimal interface.
>
>You provide it with your main class, the name of the jar you want to
>WRITE to (will be erased, so be careful), and then the source jars you
>want to compress.  For a Scala program you will usually supply at least
>the scala-library.jar, and also a jar containing your own class files.
>I used the utility to pack sbazgui, which has the following components:
>
>scala-library.jar : 1,072,659 bytes
>sbaz.jar : 317,856 bytes
>sbazgui.jar: 142,920 bytes
>--------------------------------
>total: 1,533,435 bytes
>
>Packed, self-executing sbazgui.jar: 174,085 bytes, containing all of the
>above, and double-clickable.
>
>If we use the JDK's pack200 utility to compress scala-library.jar:
>
>pack200 --unknown-attribute=strip -G -O --modification-time=latest
>scala-library.pack.gz scala-library.jar
>
>scala-library.pack.gz: 105,110 bytes
>
>If we execute:
>
>unpack200 scala-library.pack.gz scala-library-small.jar
>
>scala-library-small.jar: 837,704 bytes
>
>So in a _runtime_ environment, we can pack the scala-library down to
>around 100k if we don't intend to _compile_ against it.  I think this is
>quite relevant to the applet case; the pack200 format is allowable for
>J2SE 5 applets, and with the right pack200 command (including the
>attribute stripping) one can create a downloadable archive that is quite
>small.
>
>I suspect that further optimization is possible if the stripped .class
>files are then run through a bytecode obfuscation/optimization utility.
>
>RJ
>
>  
>


--
Burak Emir

http://lamp.epfl.ch/~emir

Reply | Threaded
Open this post in threaded view
|

Re: Fun with packing

Lex Spoon
In reply to this post by Niko Korhonen
Niko Korhonen <[hidden email]> writes:
> The current situation IMO puts Scala in a bad position against the
> competition. If we assume that everyone has a compatible JRE and .NET
> Framework installed, we get these kind of figures for runtime
> dependencies:
>
> Java with JVM: No additional libraries
> Scala with JVM: 1 MB jar file
> C# with .NET: No additional libraries
> Boo with .NET: 70 kB DLL file


It seems to me that it is Boo in the bad position.  What kind of
collections library does Boo have in its 70 kB DLL?  What kind of XML
support?  What concurrency models are available?  Scala's bytes buy
you a lot.  It would strike me as negative progress to start worrying
more about bytes than about functionality and convenience.


That said, maybe you can be more specific about what kinds of
applications you are picturing that Scala misses out on?


-Lex


PS -- rt.jar is 40 MB on my machine....

Reply | Threaded
Open this post in threaded view
|

RE: Re: Fun with packing

Judson, Ross
In reply to this post by Judson, Ross
Agreed -- does Boo ship with an effective _functional_ library?  I don't
believe so; worrying too much about bytes is counter-productive.  The
packing experiment was intended to illustrate ways to create tight,
lightweight application deployments, such as applets or webstart.  

I'll put the packload utility into sbaz this weekend; I wasn't sure if
it really belonged there, but it's useful for deploying into certain
situations.

Burak -- packload does no introspection or analysis, and does not drop
any information other than debug, attributes, and ordering.  It uses the
Pack200 API built into JDK 5 to perform an inline decompression of the
Pack200 information, then classloads from that on the fly.  It embeds a
small stub loader into the resulting JAR file that bootstraps the rest
of the process.  We could use  bytecode optimizing tools to pare down
the JAR _first_, then use packload on the result to create something
even smaller.

RJ
Reply | Threaded
Open this post in threaded view
|

Re: Fun with packing

sean.mcdirmid
My undergraduate thesis was on building a JVM for the palm pilot back  
in the days when it had to run in 32K or something like that. I wrote  
a bytecode translator that packed the classfiles into something that  
was easy to load and shared constant pool entries between classes.  
This saved ALOT of space, and I was able to fit the entire JDK 1.0.2  
class library in about 40 K.  However, this was before the library  
became huge in JDK 1.1.

Pack200 sounds like what I did with Ghost, and I think Sun was doing  
something like this before with J2ME.

Sean

On Apr 28, 2006, at 4:56 PM, Judson, Ross wrote:

> Agreed -- does Boo ship with an effective _functional_ library?  I  
> don't
> believe so; worrying too much about bytes is counter-productive.  The
> packing experiment was intended to illustrate ways to create tight,
> lightweight application deployments, such as applets or webstart.
>
> I'll put the packload utility into sbaz this weekend; I wasn't sure if
> it really belonged there, but it's useful for deploying into certain
> situations.
>
> Burak -- packload does no introspection or analysis, and does not drop
> any information other than debug, attributes, and ordering.  It  
> uses the
> Pack200 API built into JDK 5 to perform an inline decompression of the
> Pack200 information, then classloads from that on the fly.  It  
> embeds a
> small stub loader into the resulting JAR file that bootstraps the rest
> of the process.  We could use  bytecode optimizing tools to pare down
> the JAR _first_, then use packload on the result to create something
> even smaller.
>
> RJ

Reply | Threaded
Open this post in threaded view
|

Re: Fun with packing

Martin Odersky
In reply to this post by Judson, Ross
Hi, Ross:

> I observed that a significant portion of Scala's generated .class files
> is signature information, necessary for compilation.  This information
> is not necessary for runtime, though, and I figured I'd modify my
> classload-packer to excise it, and see what happens.  Things get pretty
> small, is what happens :)
>
I am surprised by your numbers. To check, I instrumented the scalac
compiler to print the size of pickled data. I counted everything inside
a ScalaSignature attribute. For the scala library (i.e. everything in
the scala/src/library/scala part of the svn repository
I got 216778 bytes.

When tarred without compression, the library produces a file of 3.5M,
whereas a jar with standard compression has size 1.4M. So this seems to
indicate that Scala signature information is about 7% of uncompressed
class file size. It's possible that signature information represents a a
larger percentage of compressed file size, since the signature files are
already quite compact and therefore might compress less well than the
rest of the class files. But it is still a far cry from the compression
ratios you get. So there must be something else that gets dropped. I
wonder what?

Cheers

  -- Martin
Reply | Threaded
Open this post in threaded view
|

Re: Fun with packing

sean.mcdirmid
Could it be because you aren't using the constant pool to represent  
the signatures? If thats the case, then Pack200 won't be able to  
eliminate redundant Scala signature information between class files,  
which is one of the main sources of their reduction.

Sean

Pack200 works most efficiently on Java class files. It uses several  
techniques to efficiently reduce the size of JAR files:

It merges and sorts the constant-pool data in the class files and co-
locates them in the archive.
It removes redundant class attributes.
It stores internal data structures.
It use delta and variable length encoding.
It chooses optimum coding types for secondary compression.


On Apr 28, 2006, at 5:35 PM, Martin Odersky wrote:

> Hi, Ross:
>
>> I observed that a significant portion of Scala's generated .class  
>> files
>> is signature information, necessary for compilation.  This  
>> information
>> is not necessary for runtime, though, and I figured I'd modify my
>> classload-packer to excise it, and see what happens.  Things get  
>> pretty
>> small, is what happens :)
> I am surprised by your numbers. To check, I instrumented the scalac  
> compiler to print the size of pickled data. I counted everything  
> inside a ScalaSignature attribute. For the scala library (i.e.  
> everything in the scala/src/library/scala part of the svn repository
> I got 216778 bytes.
>
> When tarred without compression, the library produces a file of 3.5M,
> whereas a jar with standard compression has size 1.4M. So this  
> seems to indicate that Scala signature information is about 7% of  
> uncompressed
> class file size. It's possible that signature information  
> represents a a larger percentage of compressed file size, since the  
> signature files are already quite compact and therefore might  
> compress less well than the rest of the class files. But it is  
> still a far cry from the compression ratios you get. So there must  
> be something else that gets dropped. I wonder what?
>
> Cheers
>
>  -- Martin

Reply | Threaded
Open this post in threaded view
|

Re: Fun with packing

Martin Odersky
Sean McDirmid wrote:
> Could it be because you aren't using the constant pool to represent  the
> signatures? If thats the case, then Pack200 won't be able to  eliminate
> redundant Scala signature information between class files,  which is one
> of the main sources of their reduction.

That's quite possible. I think it would be good to see for scala-library
the following data:

1. size of original jar
2. size of original jar treated with pack200
3. size of jar with scala signatures stripped out
4. size of jar with scala signatures stripped out treated with pack200

I could not get all the data from Ross' mail. Ross, can you fill in the
missing bits?

Cheers

  -- Martin
Reply | Threaded
Open this post in threaded view
|

Re: Fun with packing

Stefan Matthias Aust
Martin Odersky schrieb:
> I think it would be good to see for scala-library
> the following data:
>
> 1. size of original jar
> 2. size of original jar treated with pack200
> 3. size of jar with scala signatures stripped out
> 4. size of jar with scala signatures stripped out treated with pack200

I'm not Ross, but...

original library jar size: 1072659 bytes
repack with -G (strip debug): 1036215 bytes
repack with -G -O (reorder): 1033293 bytes
repack with -G -O -Ustrip (removing unknown attributes): 854048

This means the scala specific data is 179245 bytes or 17% in jar.

pack200 -E9 (best compression) of repacked jar: 105824 bytes

This is ~10% of the original jar. I think. Pack200 is the better the
smaller the class files are because small class files typically have
larger overheads of meta data. The .NET assembly format is more space
efficient.

Regards,

--
Stefan Matthias Aust

Reply | Threaded
Open this post in threaded view
|

Re: Fun with packing

Niko Korhonen
In reply to this post by Lex Spoon
Lex Spoon wrote:
> It seems to me that it is Boo in the bad position.  What kind of
> collections library does Boo have in its 70 kB DLL?  What kind of XML
> support?  What concurrency models are available?

Everything that's available in the .NET Framework and Python-style
arrays, lists and hash tables. In a similar manner that Scala has
everything that's available in Java standard library.

> Scala's bytes buy
> you a lot.  It would strike me as negative progress to start worrying
> more about bytes than about functionality and convenience.

In a regular case I'd agree strongly, however...

> That said, maybe you can be more specific about what kinds of
> applications you are picturing that Scala misses out on?

...I'm thinking of small applet-like tools and CLI/GUI utilities. In
Java, if you assume that all clients have a compatible JRE installed,
you can write a nifty and even pretty complex tool that fits into a 10
kilobyte JAR, because only the application code must be distributed.

If you write ten nifty tools in Scala, you have to redistribute the
Scala runtime library with each of them. Suddenly the 10 * 10 kB
download becomes 10 * 10 + 1048 kB.

If you're writing a /large/ application in Scala, where the Scala
runtime size is insignificant compared to the application distribution
size, or the application is distributed offline, or the functionality of
the application is important enough, then it doesn't matter.

> PS -- rt.jar is 40 MB on my machine....

Yes, and there was a time when this was a huge problem. But nowadays we
can more or less assume that all clients have a compatible JRE installed
and people have accustomed to having a JRE hanging around.

--
Niko Korhonen
SW engineer

Reply | Threaded
Open this post in threaded view
|

RE: Re: Fun with packing

Judson, Ross
In reply to this post by Judson, Ross
Thanks for doing this; I have been unavailable for the past few
days...apologies for not answering sooner.

Pack200 appears to be very efficient at compressing .class files in
general; the -Ustrip pulls away the attributes that Pack200 doesn't do
much with (the gzipping helps somewhat, but isn't as good as not having
them there in the first place).

This is really about deciding if it's worthwhile to have a "deploy" mode
for Scala, versus a "compiled" mode.  I'd say the jury is still quite
out on that one.  There are a limited set of circumstances under which
we want the compression we're outlining below, and perhaps having a tool
that can get us there is good enough.  

It is useful to know how dense compiled Scala can become!

RJ


-----Original Message-----
From: news [mailto:[hidden email]] On Behalf Of Stefan Matthias Aust
Sent: Sunday, April 30, 2006 5:56 PM
To: [hidden email]
Subject: Re: Fun with packing

Martin Odersky schrieb:
> I think it would be good to see for scala-library the following data:
>
> 1. size of original jar
> 2. size of original jar treated with pack200 3. size of jar with scala

> signatures stripped out 4. size of jar with scala signatures stripped
> out treated with pack200

I'm not Ross, but...

original library jar size: 1072659 bytes repack with -G (strip debug):
1036215 bytes repack with -G -O (reorder): 1033293 bytes repack with -G
-O -Ustrip (removing unknown attributes): 854048

This means the scala specific data is 179245 bytes or 17% in jar.

pack200 -E9 (best compression) of repacked jar: 105824 bytes

This is ~10% of the original jar. I think. Pack200 is the better the
smaller the class files are because small class files typically have
larger overheads of meta data. The .NET assembly format is more space
efficient.

Regards,

--
Stefan Matthias Aust