Code này làm gì?

Tham số đầu vào là một chuỗi ASCII 8-bit.

push ebp
mov ebp, esp
push ebx
mov ebx, [ebp + 8]
mov eax, ebx
_loop:
mov edx, [eax]
add eax, 4
lea ecx, [edx - 01010101h]
not edx
and ecx, edx
test ecx, 80808080h
jz _loop
test ecx, 8080h
jnz _label
shr ecx, 10h
add eax, 2
_label:
add cl, cl
sbb eax, 3
sub eax, ebx
pop ebx
leave
ret

Code ban đầu rất gọn (tìm thấy trong một mẫu virút), tôi sửa lại (và làm cho nó xấu đi) để có thể biên dịch với NASM. Câu hỏi phụ: tại sao lại viết như thế này mà không viết kiểu bình thường?

Chủ đề : Chưa phân loạiBookmark the permalink. Trackbacks are closed, but you can post a comment.

4 Comments

  1. Posted 06/12/2012 at 12:55 am | Permalink

    Code này dùng tính độ dài chuỗi, tương đương với strlen() trong C. Nhưng em không rõ tại sao lại viết thế này, anh giải thích cho em với :d

  2. npson
    Posted 06/12/2012 at 9:42 am | Permalink

    1. This is DWORD-aligned optimized strlen(). I will write this in English so everybody can understand.

    line 7: load 4 bytes ptr by EAX
    line 9: substract each byte by 1. NULL bytes in ECX will be 0xFF.
    line 10: 4 original bytes in EDX are inverted. NULL bytes in EDX will be 0xFF.
    line 11: all 0xFF bytes are kept after the AND.
    line 12: test highest bit of each bytes (0x80 = 0b10000000). 0xFF triggers CPU to set ZF = 0 i.e loop stops when there’s NULL byte in 4 bytes loaded from ptr EAX.
    line 14: check if NULL bytes are in first 16bits of ECX. This is a acutally a check if (string length % 4) == 0 || ==1.
    line 16 & 17: If (length % 4) == 2 or == 3, 2 high bytes in ECX are shifted to 2 low bytes in ECX.
    line 19: CF is set if CL is 0xFF. Its a smart check for if (length % 4) == 0 or == 2.
    line 20: If length % 4 == 0 or 2, the result will be substracted by 4 or 2, else it will be substracted by 1 or 3 hence we get the correct string length after substract the pointer to ASCII string.

    2. This is faster than traditional strlen(). Only N/4+1 loops to search for NULL with a N bytes string.

    • abc
      Posted 07/12/2012 at 5:22 pm | Permalink

      Great explanation! I agree that the number of loops is reduced 4 times; however, I don’t think the actual runtime is reduced that much.
      Each iteration in the above main loop takes 7 cycles. Therefore, it takes roughly 7n/4 cycles to scan the string.
      Let’s consider the naive implementation using “repne scasb”. Each iteration in this case takes 4 cycles, and totally we need 4n cycles to find the NULL character.
      Therefore, the above implementation is around 16/7 times faster than the naive one, given no pipelining.

  3. sontran
    Posted 08/12/2012 at 7:10 am | Permalink

    loop unrolling thay vi kiemtra tung byte, doan code nay kiem tra 4 bytes mot lan

    chu virus nay copy tu trick optimize cua Agner Fog http://www.agner.org/optimize/optimizing_assembly.pdf (strlen)
    ma d/c Agner Fog goi no cai ten my mieu “vector operations in general purpose registers”.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*
*