A peek into the language and scripting files of Double Life World / Girl Cafe Gun

Posted by: Chise Hachiroku - Posted on:

When organising the GCG.moe and providing more services and tips for the gamers, sometimes I get support/resources from the officials, but in most cases, I do not.

However, by usual practices of re-creations, we are usually allowed to analyse the game package if they are not encrypted – and this is the case with the language and part of the scripting files for this particular game.

It is quite strange though, to see a commercial game have such large portion remain unencrypted and stay in standard UABs, especially for plots.

The discoveries described in this post is used to make Source Archive of GCG.moe possible. Without them such production is beyond imagination.

The Language File

Let’s make it straight, it is a MonoBehaviour class in a file called lang.u.

I have managed to transform language file into CSV file approximately a year ago, there were undisclosed programs that would show its contents in a shitty Python UI (presumably tkinter), but that program is highly inefficient and cross-examination of two files takes an unbelievable 20 minutes.

I did not know that there are already-made solutions in .dll files. I was assuming this is a proprietary format (well, I should not) and decided to open this file using WinHex. After staring at the screen for hours, I have managed to get the file format of this file to the extent that a program could read it:

  1. Meanings of first 28 characters (224 bits) are not known.
  2. 32-bit unsigned int in little endian, representing the number of elements of the titles.
  3. And then for each title, the format as follows:
    1. 32-bit unsigned int in little endian, marking the length of the current entry in characters.
    2. Current entry in plaintext.
    3. If the entry length is not a multiple of 4 (by characters, 32 by bits), add ‘0x00‘s to its end to make it one.
      (Added ‘0x00‘s do not count into the aforementioned entry length)
  4. 2-3 repeated again, this time for contents. Note that a valid file must have the same number of entries as they match one-to-one based on their place in respective arrays.
  5. EOF.

This file format is basically using two arrays to store a dictionary (it is a std::map in C++). By now it is very clear that this is most likely NOT something new but an existing solution. I am not aware what program creates and uses it, but it is quite clear that this format is for programs to read arrays efficiently, and 9 in 10 it is not created for a dictionary, as it makes no sense to have two counts of entries.

It is very easy, at this stage, to write a program that reads it. I’ll not disclose this code at this stage, but if someday I am allowed to, it will be at https://github.com/c86-moe/GCG-lang.u-Extractor. This repo is currently in private due to requirements from parties concerned and my promises to them.

Being able to read the language file efficiently and have them organised in the memory paved path for what comes next.

The Scripting Files

Scripts, from what I have seen by tapping processes, seems to be separating into three AssetBundles. They are settings.u, lua.u and com.u, and among them only the last one is in an unencrypted state. The first two remains black boxes up until now, but the only unencrypted bundle contains scripts that present stories.

Having the experience of the language file nearly a year ago and also some twists during it, I have realised that it would be extremely rare event that the developers are not using an existing solution. I found some clues elsewhere in the package and I have confirmed that they are using an open-source project by Tencent called xLua that allow storywriters to compose their amazing creations using the nice Lua.

Overall, this is a good solution because it saves time and trainings, but developers are using a confusion software that is not very sound and they know it. In production I have seen reports of crappy, syntax-incomplete Lua scripts leading to NullPointerException ANRs. Also, since the ones writing Lua scripts are people leaning to the humanities instead of the engineering sectors, the quality of Lua scripts varies greatly, thanks to Lua’s loose requirements regarding formality.

They have created a number of functions based on those xLua provides. But there are only a few syntaxes (or statements) that I am particularly interested in:

  • set title (line title)
    Sets the title of the window to the contents of the specified line.
  • section (section number)
    Marks the start of a section. The end of a section is marked by another section remark. It works pretty like a label.
  • goto section (section number)
    Unconditionally jump the next line to execute to the start of the specified section.
  • if choice (line title) goto section (section number)
    Creates a choice (usually 2-5 of them appear series). In case the user has chosen this option, it would set the next execution line to the start of the specified section.
  • call set_speaker_and_play* (speaker id), (display area), (line title)
    Main ones we are looking after, it displays most of the lines and all lines in dialogue box.
  • call show_cg_text* (speaker id), (line title)
    Display most of the things that display outside of the dialogue boxes, and they end up showing as either (speaker’s name): (line content) or (line content). In the latter case, the contents of the line usually are in the format of the former.
  • play textfade (display area), (line title)
    Showing lines with pictures, like those scenes displaying (showing-off) cards.

There are some general fields, as explained below:

  • * is replaced by a number, and I have seen multiple single digit natural numbers being used in scripts. Since syntaxes for all these are basically the same, my program just detects the fixed part.
  • Speaker ID is usually a number that is assigned to all the girls and most of the NPCs and is taken from the line with title _GirlInfo.(speaker id).NickName. But this is not always the case, especially if the speaker is only used once or twice or they are not a figure at all, and program will attempt to query this ID as the whole title of the line. There is also a special case of ‘-1‘, which means this is a voiceover and there is no specific person saying this (‘O.S.’ in screenplays).
  • Display area is usually taken by mytext, a default display area that is set at the start of scripts by load messagetext as mytext, which my program ends up dismissing this.

Unlike assemblies, Lua uses spaces instead of brackets to control enclosure of code, and it does have very good loop syntaxes, yet the developers still opted-in for the goto that should go to ditches in higher languages (good works Java). In all cases I have manually inspected, I notice the sections referred by every choice is mostly 2-10 lines before it jumps to a common section. Tiny branches like these could definitely be included using a properly written if syntax.

With these in mind, we can write a comprehensive CSV or JSON including sections, branches and lines from the language files from the scripts. I later written an interpreter in C++ that would do exactly the work. The child of this interpreter is the Source Archive.

Some Last Words

Double Life World / Girl Cafe Gun is a good game in general, with its astonishing Live2D and experimental yet creative gameplay. But to be honest, its codebase sucks. It crashes on most Samsung devices and does not run at all starting from One UI 5, and that means one in five smartphones will experience such issues.

As I have mentioned earlier, its Lua scripts are pretty casual. Sometimes there are brackets for functions, but most do not.

Do you still remember the already-made python solution that could do cross-examination? I modified my extractor, and it only takes 90 seconds max. C / C++ is an efficient programming language.

Share this:  

Have your Say Here

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.