我希望提交以下代码(显然适合java.util),可以显着提高性能并减少java.util.UUID的无用分配。在让我自己接受JDK维护者的判断之前,请帮助我发现错误和愚蠢! :-)

         benchmark instances  Bytes    ns linear runtime
 JdkUuidFromString     51.00 1544.0 608.2 ==============================
NessUuidFromString      2.00   72.0 179.1 ========
   JdkUuidToString     31.00 1720.0 321.4 ===============
  NessUuidToString      3.00  200.0  51.5 ==


FromString获得3倍的加速和对象分配的1/25。 ToString的速度提高了6倍,对象分配提高了1/10。

代码如下:

/**
 * A class that provides an alternate implementation of {@link
 * UUID#fromString(String)} and {@link UUID#toString()}.
 *
 * <p> The version in the JDK uses {@link String#split(String)}
 * which does not compile the regular expression that is used for splitting
 * the UUID string and results in the allocation of multiple strings in a
 * string array. We decided to write {@link NessUUID} when we ran into
 * performance issues with the garbage produced by the JDK class.
 *
 */
public class NessUUID {
    private NessUUID() {}

    private static final int NUM_ALPHA_DIFF = 'A' - '9' - 1;
    private static final int LOWER_UPPER_DIFF = 'a' - 'A';

    // FROM STRING

    public static UUID fromString(String str) {
        int dashCount = 4;
        int [] dashPos = new int [6];
        dashPos[0] = -1;
        dashPos[5] = str.length();

        for (int i = str.length()-1; i >= 0; i--) {
            if (str.charAt(i) == '-') {
                if (dashCount == 0) {
                    throw new IllegalArgumentException("Invalid UUID string: " + str);
                }
                dashPos[dashCount--] = i;
            }
        }

        if (dashCount > 0) {
            throw new IllegalArgumentException("Invalid UUID string: " + str);
        }

        long mostSigBits = decode(str, dashPos, 0) & 0xffffffffL;
        mostSigBits <<= 16;
        mostSigBits |= (decode(str, dashPos, 1) & 0xffffL);
        mostSigBits <<= 16;
        mostSigBits |= (decode(str,  dashPos, 2) & 0xffffL);

        long leastSigBits = (decode(str,  dashPos, 3) & 0xffffL);
        leastSigBits <<= 48;
        leastSigBits |= (decode(str,  dashPos, 4) & 0xffffffffffffL);

        return new UUID(mostSigBits, leastSigBits);
    }

    private static long decode(final String str, final int [] dashPos, final int field) {
        int start = dashPos[field]+1;
        int end = dashPos[field+1];
        if (start >= end) {
            throw new IllegalArgumentException("Invalid UUID string: " + str);
        }
        long curr = 0;
        for (int i = start; i < end; i++) {
            int x = getNibbleFromChar(str.charAt(i));
            curr <<= 4;
            if (curr < 0) {
                throw new NumberFormatException("long overflow");
            }
            curr |= x;
        }
        return curr;
    }

    static int getNibbleFromChar(final char c)
    {
        int x = c - '0';
        if (x > 9) {
            x -= NUM_ALPHA_DIFF; // difference between '9' and 'A'
            if (x > 15) {
                x -= LOWER_UPPER_DIFF; // difference between 'a' and 'A'
            }
            if (x < 10) {
                throw new IllegalArgumentException(c + " is not a valid character for an UUID string");
            }
        }

        if (x < 0 || x > 15) {
            throw new IllegalArgumentException(c + " is not a valid character for an UUID string");
        }

        return x;
    }

    // TO STRING

    public static String toString(UUID uuid)
    {
        return toString(uuid.getMostSignificantBits(), uuid.getLeastSignificantBits());
    }

    /** Roughly patterned (read: stolen) from java.util.UUID and java.lang.Long. */
    public static String toString(long msb, long lsb)
    {
        char[] uuidChars = new char[36];

        digits(uuidChars, 0, 8, msb >> 32);
        uuidChars[8] = '-';
        digits(uuidChars, 9, 4, msb >> 16);
        uuidChars[13] = '-';
        digits(uuidChars, 14, 4, msb);
        uuidChars[18] = '-';
        digits(uuidChars, 19, 4, lsb >> 48);
        uuidChars[23] = '-';
        digits(uuidChars, 24, 12, lsb);

        return new String(uuidChars);
    }

    private static void digits(char[] dest, int offset, int digits, long val) {
        long hi = 1L << (digits * 4);
        toUnsignedString(dest, offset, digits, hi | (val & (hi - 1)), 4);
    }

    private final static char[] DIGITS = {
        '0' , '1' , '2' , '3' , '4' , '5' ,
        '6' , '7' , '8' , '9' , 'a' , 'b' ,
        'c' , 'd' , 'e' , 'f' , 'g' , 'h' ,
        'i' , 'j' , 'k' , 'l' , 'm' , 'n' ,
        'o' , 'p' , 'q' , 'r' , 's' , 't' ,
        'u' , 'v' , 'w' , 'x' , 'y' , 'z'
    };

    private static void toUnsignedString(char[] dest, int offset, int len, long i, int shift) {
        int charPos = len;
        int radix = 1 << shift;
        long mask = radix - 1;
        do {
            dest[offset + --charPos] = DIGITS[(int)(i & mask)];
            i >>>= shift;
        } while (i != 0 && charPos > 0);
    }
}


评论

在Java 7中,String.split()对不包含元字符的单个字符模式进行了“快速路径”改进。在这种情况下,在“-”上分割将采用快速路径。此快速路径避免使用正则表达式引擎,并提高了琐碎文件的性能。不过,很高兴看到这些改进。

toString(long msb,long lsb)应该是私有的。

#1 楼

我无法使Caliper运行,但破解了自己的测试:

我的最初结果是:

warmup
JdkFrom: 1787.38 JdkTo: 635.12 NessFrom: 460.15 NessTo: 183.67 [-4552303853801426784, 69220000, -4552303853801426784, 69220000]
Real Run
JdkFrom: 1415.68 JdkTo: 553.28 NessFrom: 426.29 NessTo:  94.69 [-4552303853801426784, 69220000, -4552303853801426784, 69220000]
JdkFrom: 1394.24 JdkTo: 387.14 NessFrom: 340.78 NessTo:  59.33 [-4552303853801426784, 69220000, -4552303853801426784, 69220000]
JdkFrom: 1378.38 JdkTo: 339.20 NessFrom: 325.73 NessTo:  59.20 [-4552303853801426784, 69220000, -4552303853801426784, 69220000]
JdkFrom: 1381.61 JdkTo: 334.28 NessFrom: 389.30 NessTo:  59.09 [-4552303853801426784, 69220000, -4552303853801426784, 69220000]


所以,从表面上看,是的,您的算法就快得多了。

对于代码审查,我有一些评论:

fromString()


我不喜欢这样,您会忽略UUID的必需格式,实际上,您说它有4个破折号就很酷,但是,实际上,破折号之间的位数非常大,而您忽略了它。
我觉得您应该在验证和计算破折号的同时计算长位。此后重复循环似乎是多余的。
如果您正在寻找原始性能,我发现的一个窍门是查找表有很大的不同...我将举一个例子。

toString()


我不喜欢公共toString(long,long)方法。这不是“对称的”。仅toString(UUID)应该是公开的。
DIGITS代码似乎旨在满足许多不同的基数(基数是复数)。这使这种特殊情况显得有些笨重。
方法调用的层次太多。可能要浅得多。


考虑:

我对此有一个破解,因此决定可以做得更好....考虑以下结果:

warmup
JdkFrom: 1929.14 JdkTo: 542.10 NessFrom: 270.43 NessTo: 175.71 [2254274162472357232, 70459000, 2254274162472357232, 70459000]
Real Run
JdkFrom: 1569.85 JdkTo: 404.93 NessFrom: 249.37 NessTo:  45.94 [2254274162472357232, 70459000, 2254274162472357232, 70459000]
JdkFrom: 1528.79 JdkTo: 279.55 NessFrom: 114.74 NessTo:  44.71 [2254274162472357232, 70459000, 2254274162472357232, 70459000]
JdkFrom: 1657.85 JdkTo: 271.24 NessFrom: 118.20 NessTo:  44.43 [2254274162472357232, 70459000, 2254274162472357232, 70459000]
JdkFrom: 1563.52 JdkTo: 273.69 NessFrom: 140.96 NessTo:  46.46 [2254274162472357232, 70459000, 2254274162472357232, 70459000]


这比fromString的版本快三倍,比toString的速度快0.2倍。

这里是(以我的经验)与Java一样快的代码:

package uuid;

import java.util.Arrays;
import java.util.UUID;

/**
 * A class that provides an alternate implementation of {@link
 * UUID#fromString(String)} and {@link UUID#toString()}.
 *
 * <p> The version in the JDK uses {@link String#split(String)}
 * which does not compile the regular expression that is used for splitting
 * the UUID string and results in the allocation of multiple strings in a
 * string array. We decided to write {@link NessUUID} when we ran into
 * performance issues with the garbage produced by the JDK class.
 *
 */
public class NessUUID {
    private NessUUID() {}

    // lookup is an array indexed by the **char**, and it has
    // valid values set with the decimal value of the hex char.
    private static final long[] lookup = buildLookup();
    private static final int DASH = -1;
    private static final int ERROR = -2;
    private static final long[] buildLookup() {
        long [] lu = new long[128];
        Arrays.fill(lu, ERROR);
        lu['0'] = 0;
        lu['1'] = 1;
        lu['2'] = 2;
        lu['3'] = 3;
        lu['4'] = 4;
        lu['5'] = 5;
        lu['6'] = 6;
        lu['7'] = 7;
        lu['8'] = 8;
        lu['9'] = 9;
        lu['a'] = 10;
        lu['b'] = 11;
        lu['c'] = 12;
        lu['d'] = 13;
        lu['e'] = 14;
        lu['f'] = 15;
        lu['A'] = 10;
        lu['B'] = 11;
        lu['C'] = 12;
        lu['D'] = 13;
        lu['E'] = 14;
        lu['F'] = 15;
        lu['-'] = DASH;
        return lu;
    }

    // FROM STRING

    public static UUID fromString(final String str) {
        final int len = str.length();
        if (len != 36) {
            throw new IllegalArgumentException("Invalid UUID string (expected to be 36 characters long)");
        }
        final long[] vals = new long[2];
        int shift = 60;
        int index = 0;
        for (int i = 0; i < len; i++) {
            final int c = str.charAt(i);
            if (c >= lookup.length || lookup[c] == ERROR) {
                throw new IllegalArgumentException("Invalid UUID string (unexpected '" + str.charAt(i) + "' at position " + i + " -> " + str + " )");
            }

            if (lookup[c] == DASH) {
                if ((i - 8) % 5 != 0) {
                    throw new IllegalArgumentException("Invalid UUID string (unexpected '-' at position " + i + " -> " + str + " )");
                }
                continue;
            }
            vals[index] |= lookup[c] << shift;
            shift -= 4;
            if (shift < 0) {
                shift = 60;
                index++;
            }
        }
        return new UUID(vals[0], vals[1]);
    }

    // TO STRING

    // recode is 2-byte arrays representing the hex representation of every byte value (all 256)
    private static final char[][] recode = buildByteBlocks();
    private static char[][] buildByteBlocks() {
        final char[][] ret = new char[256][];
        for (int i = 0; i < ret.length; i++) {
            ret[i] = String.format("%02x", i).toCharArray();
        }
        return ret;
    }

    public static final String toString(final UUID uuid) {
        long msb = uuid.getMostSignificantBits();
        long lsb = uuid.getLeastSignificantBits();
        char[] uuidChars = new char[36];
        int cursor = uuidChars.length;
        while (cursor > 24 ) {
            cursor -= 2;
            System.arraycopy(recode[(int)(lsb & 0xff)], 0, uuidChars, cursor, 2);
            lsb >>>= 8;
        }
        uuidChars[--cursor] = '-';
        while (cursor > 19) {
            cursor -= 2;
            System.arraycopy(recode[(int)(lsb & 0xff)], 0, uuidChars, cursor, 2);
            lsb >>>= 8;
        }
        uuidChars[--cursor] = '-';
        while (cursor > 14) {
            cursor -= 2;
            System.arraycopy(recode[(int)(msb & 0xff)], 0, uuidChars, cursor, 2);
            msb >>>= 8;
        }
        uuidChars[--cursor] = '-';
        while (cursor > 9) {
            cursor -= 2;
            System.arraycopy(recode[(int)(msb & 0xff)], 0, uuidChars, cursor, 2);
            msb >>>= 8;
        }
        uuidChars[--cursor] = '-';
        while (cursor > 0) {
            cursor -= 2;
            System.arraycopy(recode[(int)(msb & 0xff)], 0, uuidChars, cursor, 2);
            msb >>>= 8;
        }
        return new String(uuidChars);
    }

}


为了娱乐,这里是我的测试类(我不知道Caliper) :

package uuid;

import java.util.Arrays;
import java.util.UUID;


public class PerformanceComparison 
{

    private final int N_UUIDS = 1000;
    private final UUID[] testUuids = new UUID[N_UUIDS];
    private final String[] testStrings = new String[N_UUIDS];

    public void setup () {
        for (int i = 0; i < N_UUIDS; i++)
        {
            testUuids[i] = UUID.randomUUID();
            testStrings[i] = testUuids[i].toString();
        }
    }

    public static void main(String[] args) {
        PerformanceComparison pc = new PerformanceComparison();

        final UUID uuidj = UUID.randomUUID();
        String valj = uuidj.toString();
        String valn = NessUUID.toString(uuidj);
        UUID uuidn = NessUUID.fromString(valn);
        if (!valj.equals(valn)) {
            throw new IllegalStateException("Illegal conversion");
        }
        if (!uuidj.equals(uuidn)) {
            throw new IllegalStateException("Illegal conversion");
        }

        pc.setup();
        final int reps = 1000000;

        System.out.println("    warmup");
        pc.runAll(reps);
        System.out.println("    Real Run");
        pc.runAll(reps);
        pc.runAll(reps);
        pc.runAll(reps);
        pc.runAll(reps);

    }

    private final void runAll(final int reps) {
        long[] accum = new long[4];
        System.out.printf("    JdkFrom: %6.2f JdkTo: %6.2f NessFrom: %6.2f NessTo: %6.2f %s\n", 
                timeJdkUuidFromString(reps, accum, 0) / 1000000.0,
                timeJdkUuidToString(reps, accum, 1) / 1000000.0,
                timeNessUuidFromString(reps, accum, 2) / 1000000.0,
                timeNessUuidToString(reps, accum, 3) / 1000000.0,
                Arrays.toString(accum));
    }

    public long timeJdkUuidFromString(int reps, long[] accum2, int j)
    {
        long accum = 0;
        long start = System.nanoTime();
        for (int i = 0; i < reps; i++)
        {
            accum += UUID.fromString(testStrings[i % N_UUIDS]).getMostSignificantBits();
        }
        accum2[j] = accum;
        return System.nanoTime() - start;
    }

    public long timeJdkUuidToString(int reps, long[] accum2, int j)
    {
        long accum = 0;
        long start = System.nanoTime();
        for (int i = 0; i < reps; i++)
        {
            accum += testUuids[i % N_UUIDS].toString().charAt(0);
        }
        accum2[j] = accum;
        return System.nanoTime() - start;
    }

    public long timeNessUuidFromString(int reps, long[] accum2, int j)
    {
        long accum = 0;
        long start = System.nanoTime();
        for (int i = 0; i < reps; i++)
        {
            accum += NessUUID.fromString(testStrings[i % N_UUIDS]).getMostSignificantBits();
        }
        accum2[j] = accum;
        return System.nanoTime() - start;
    }

    public long timeNessUuidToString(int reps, long[] accum2, int j)
    {

        long accum = 0;
        long start = System.nanoTime();
        for (int i = 0; i < reps; i++)
        {
            accum += NessUUID.toString(testUuids[i % N_UUIDS]).charAt(0);
        }
        accum2[j] = accum;
        return System.nanoTime() - start;
    }

}