In Part 1, I explained my motives for writing software to play Bejeweled Blitz. In Part 2, I defined the terms and general outline of a program to automatically play Bejeweled Blitz. In Part 3, I'll start at the screen level and work all the way down to the pixel, showing how I detect the color and state of each gem on the grid.
Where to Begin
I run 64-bit Ubuntu at home, and my browser is Firefox. To capture and analyze the contents of the screen, I used XLib API calls. In X, every window is laid out in a hierarchy starting with the root window that holds the desktop, taskbars, and all the top-level application windows. So first I open my display with XOpenDisplay and store it for the life of the capture job, since it gets used throughout the process. Next, I wrote a function to search a given window for the word "Bejeweled" in its menu text (using XGetWMName). If found, it returns a handle to the window. Otherwise it uses XQueryTree to get an array of all the immediate children and recursively calls itself with each of them. Then it is just a simple matter of calling that function initially with the result of RootWindow(disp, DefaultScreen(disp)).
Now we drop down a level from screen to window: specifically, the Firefox browser in which Bejeweled is running. To get an image of that window, it is as easy as calling XGetWindowAttributes to see how big it is, and then XGetImage to get an XImage pointer that we can analyze pixel by pixel. To get a pixel, I use XGetPixel, passing the XImage that resulted from XGetImage, and the X/Y coordinates of the pixel I want. This returns me the RGB value of the pixel as a long integer, which I can break into separate color levels with a little ANDing and shifting.
Once I know the grid origin, I can divide the grid into an 8x8 array of cells, with each cell 40x40 pixels in size. At this point I had to get a little more creative, because of the dynamic nature of the gems and the board. The background changes color frequently in response to multiplier changes, power-ups, and game mode. The gems also spin when they're clicked on, making pixel-by-pixel identification impossible. The key here is to focus on what matters, and to eliminate that which doesn't.
Special Multiplier Processing
Sensing the Aura
The next bit of special processing is to determine whether the cell is flaming or is a crosshair. These are almost as important to detect as multipliers, because using them increases the chance of getting a multiplier: especially crosshairs, which will generate a multiplier on every use, so long as the multiplier time limit has expired. Crosshairs also require some special color processing later, so we need to know if the current cell is a crosshair before we start looking at color.
The way I detect crosshairs and flames is to look at the top-middle of the cell, actually bleeding over into the cell above it by one pixel and extending down two pixels into the current cell. This area is never occupied by gems at rest, so it is a good place to search for auras. Based on the average color in this region, I determine if we have a crosshair, a flame, or a normal gem in this cell.
Once I know the color and state, I can move on to the next cell, repeating the process until all 64 cells have been identified. If falling gems, hypercubes, or other temporary embellishments cause a cell to be undetected or mis-detected, it usually has little to no effect on the outcome of the game. There is enough going on at any given time that little pockets of misinformation can be absorbed.
The conclusion is in Part 4: Playing the Game