I've been experimenting with mail services to see which ones allow images to be viewed without explicit user request, and how they go about it if they do.
I made a little script that creates images as they're requested (big bold banners that say you shouldn't really see this without expecting it) and logs all the headers that might be of use. Rewrite voodoo makes it look like static image file and every header I could think of to turn off caching was enabled.
I started with gmail, since they changed their policy/process on this a little while ago and I've always wondered about it. The message said they'd made it safe to show some images automatically.
I sent myself a message, the idea being that it would almost certainly be judged a safe source. Later, I intend to see how these services respond to spoofed e-mails, but that's not for now.
The image is loaded, great. I check the log and find I get just a Google proxy IP and a helpful user-agent too:
Wed, 02 Mar 2016 08:51:29 1456908689.717 184.108.40.206 /[redacted]/s2.jpg imname=s2 Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0 (via ggpht.com GoogleImageProxy)
Server timestamp followed by request time in header, IP, image requested and header info, user-agent.
But, I also got a line just before that is a bit more interesting.
Wed, 02 Mar 2016 08:35:29 1456907729.842 220.127.116.11 xx.xx.xx.xx /[redacted]/clock.jpg imname=clock Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google Favicon
http_x_forwarded_for header is set and it is my public IP (shown as xx.xx.xx.xx here). The requested image wasn't mentioned in the e-mail, but it is one I've used for testing purposes while developing the scripts. The user-agent is another Google bot, but not the proxy from before. And that image isn't set as the favicon at all, anywhere. It poppped up this morning before I opened the e-mail, but the e-mail was sent the night before, so it's been in my inbox and my gmail inbox has been open on this machine.
I'm currently trying to track down the cause and reproduce, but at the moment, all I know is that some google/gmail "thing" is reacting to what I'm going on my computer and accessing things on my behalf. It could be the browser but not directly.
I can't get my head around it at the moment - if it's a crawler, why is it referring to my IP? If it's my browser, why does it identify as Mozilla/Firefox on windows when that combination doesn't exist on my IP, and why as a crawler at all?
So I started out looking at one thing that tickled my curiosity and now I'm looking at a different puzzle. Oh, and I tried Googling but found no answer, so clearly this is a conspiracy of some sort. It usually is if you look hard enough.