Wednesday, June 13, 2007
The first feature available on Draft is Video Upload, accessible via a new button in the Post Editor. Check it out!
You've probably already figured this out if you use webmaster tools, the webmaster help center, or our webmaster discussion forum, but the webmaster central team is a fantastic group of people. You have seen some of them helping out in the discussion forums, and you may have met a few more at conferences, but there are lots of others behind the scenes who you don't see, working on expanding webmaster tools, writing content, and generally doing all they can for you, the webmaster. Even the team members you don't see are paying close attention to your feedback: reading our discussion forum, as well as blogs and message boards. We introduced you to a few of the team before SES NY and Danny Sullivan told you about a few Googler alternatives before SES Chicago. We also have several interns working with us right now, including Marcel, who seems to have been the hit of the party at SMX Advanced.
I am truly pleased to welcome a new addition to the team, although she'll be a familiar face to many of you already. Susan Moskwa is joining Jonathan Simon as a webmaster trends analyst! She's already started posting on the forums and is doing lots of work behind the scenes. Jonathan does a wonderful job answering your questions and investigating issues that come up and he and Susan will make a great team. Susan is a bit of a linguistic genius, so she'll also be helping out in some of the international forums, where Dublin Googlers have started reading and replying to your questions. Want to know more about Susan? You just never know what you find when you do a Google search.
eBay Live attendees have plenty of activities to keep them busy this week in Boston, and we did not want to detract from that activity. After speaking with officials at eBay, we at Google agreed that it was better for us not to feature this event during the eBay Live conference. Google is constantly reaching out to new users and sellers, and we are available to privately discuss any matters of concern with individuals as they relate to Google products. Interested parties may contact us at firstname.lastname@example.org.
Last week, I participated in the duplicate content summit at SMX Advanced. I couldn't resist the opportunity to show how Buffy is applicable to the everday Search marketing world, but mostly I was there to get input from you on the duplicate content issues you face and to brainstorm how search engines can help.
A few months ago, Adam wrote a great post on dealing with duplicate content. The most important things to know about duplicate content are:
- Google wants to serve up unique results and does a great job of picking a version of your content to show if your sites includes duplication. If you don't want to worry about sorting through duplication on your site, you can let us worry about it instead.
- Duplicate content doesn't cause your site to be penalized. If duplicate pages are detected, one version will be returned in the search results to ensure variety for searchers.
- Duplicate content doesn't cause your site to be placed in the supplemental index. Duplication may indirectly influence this however, if links to your pages are split among the various versions, causing lower per-page PageRank.
Specifying the preferred version of a URL in the site's Sitemap file
One thing we discussed was the possibility of specifying the preferred version of a URL in a Sitemap file, with the suggestion that if we encountered multiple URLs that point to the same content, we could consolidate links to that page and could index the preferred version.
Providing a method for indicating parameters that should be stripped from a URL during indexing
We discussed providing this in either an interface such as webmaster tools on in the site's robots.txt file. For instance, if a URL contains sessions IDs, the webmaster could indicate the variable for the session ID, which would help search engines index the clean version of the URL and consolidate links to it. The audience leaned towards an addition in robots.txt for this.
Providing a way to authenticate ownership of content
This would provide search engines with extra information to help ensure we index the original version of an article, rather than a scraped or syndicated version. Note that we do a pretty good job of this now and not many people in the audience mentioned this to be a primary issue. However, the audience was interested in a way of authenticating content as an extra protection. Some suggested using the page with the earliest date, but creation dates aren't always reliable. Someone also suggested allowing site owners to register content, although that could raise issues as well, as non-savvy site owners wouldn't know to register content and someone else could take the content and register it instead. We currently rely on a number of factors such as the site's authority and the number of links to the page. If you syndicate content, we suggest that you ask the sites who are using your content to block their version with a robots.txt file as part of the syndication arrangement to help ensure your version is served in results.
Making a duplicate content report available for site owners
There was great support for the idea of a duplicate content report that would list pages within a site that search engines see as duplicate, as well as pages that are seen as duplicates of pages on other sites. In addition, we discussed the possibility of adding an alert system to this report so site owners could be notified via email or RSS of new duplication issues (particularly external duplication).
Working with blogging software and content management systems to address duplicate content issues
Some duplicate content issues within a site are due to how the software powering the site structures URLs. For instance, a blog may have the same content on the home page, a permalink page, a category page, and an archive page. We are definitely open to talking with software makers about the best way to provide easy solutions for content creators.
In addition to discussing potential solutions to duplicate content issues, the audience had a few questions.
Q: If I nofollow a substantial number of my internal links to reduce duplicate content issues, will this raise a red flag with the search engines?
The number of nofollow links on a site won't raise any red flags, but that is probably not the best method of blocking the search engines from crawling duplicate pages, as other sites may link to those pages. A better method may be to block pages you don't want crawled with a robots.txt file.
Q: Are the search engines continuing the Sitemaps alliance?
We launched sitemaps.org in November of last year and have continued to meet regularly since then. In April, we added the ability for you to let us know about your Sitemap in your robots.txt file. We plan to continue to work together on initiatives such as this to make the lives of webmasters easier.
Q: Many pages on my site primarily consist of graphs. Although the graphs are different on each page, how can I ensure that search engines don't see these pages as duplicate since they don't read images?
To ensure that search engines see these pages as unique, include unique text on each page (for instance, a different title, caption, and description for each graph) and include unique alt text for each image. (For instance, rather than use alt="graph", use something like alt="graph that shows Willow's evil trending over time".
Q: I've syndicated my content to many affiliates and now some of those sites are ranking for this content rather than my site. What can I do?
If you've freely distributed your content, you may need to enhance and expand the content on your site to make it unique.
Q: As a searcher, I want to see duplicates in search results. Can you add this as an option?
We've found that most searchers prefer not to have duplicate results. The audience member in particular commented that she may not want to get information from one site and would like other choices, but for that case, other sites will likely not have identical information and therefore will show up in the results. Bear in mind that you can add the "&filter=0" parameter to the end of a Google web search URL to see additional results which might be similar.
I've brought back all the issues and potential solutions that we discussed at the summit back to my team and others within Google and we'll continue to work on providing the best search results and expanding our partnership with you, the webmaster. If you have additional thoughts, we'd love to hear about them!
Tuesday, June 12, 2007
The Windows Live Writer team has released Beta 2 of their software, which adds labels support along with the proverbial host of other features. Developer and friend of Blogger Joe Cheng covers the highlights and lowlights on his blog, and you can grab the beta for Windows Vista and XP over at the download page.
In the Macintosh corner, Red Sweater Software's MarsEdit has been updated to version 1.2, adding Blogger photo upload support via Picasa Web Albums. Developer Daniel Jalkut (also a friend of Blogger) describes the update on Red Sweater's blog. You can download a 30 day trial (Mac OS X 10.3.9 or higher) from the MarsEdit page.
For bonus additional Blogger goodness (for Mac users), grab the newly updated, newly working again Blogger Dashboard widget from Google's widget page. F12 + typing = blog post.
Working on a Blogger client of your own? Make sure you're hanging out in the Blogger Dev group to chat and keep in touch.
Starting today, there's a new feature that makes Custom Search Engines (CSEs) even easier to create and keep up to date.
You can now create a CSE by simply placing a small piece of tailored code on a page on your site. With that one piece of code, Google's search technology will automatically include in your new CSE all of the sites you have linked to from that page, creating a dynamic, powerful and tailored search experience really quickly. Moreover, your new CSE will update itself periodically to include any new links added to that page.
So, if you have a blog or a directory-like site and don't feel like listing all of the URLs you want to search across, you can leave the work to us. With this new feature we'll automatically generate and update your CSE for you. For example, try the query 'sculpture' on this CSE dynamically created from a page of links to kids museums or the query 'planning' on the search engine about Artificial Intelligence we created from the page of links at Berkeley.
Pretty cool, eh? We think so too. There are many powerful things you can do with this new feature, and in the near future we'll be talking about different possibilities. In the meantime, however, feel free to get your dynamic Custom Search Engine up and running. We'll be back in an instant.
Keep the feedback and great ideas coming!
Links are an important signal in our PageRank calculations, as they tend to indicate when someone has found a page useful. Links that are purchased are great for advertising and traffic purposes, but aren't useful for PageRank calculations. Buying or selling links to manipulate results and deceive search engines violates our guidelines.
Today, in response to your request, we're providing a paid links reporting form within Webmaster Tools. To use the form, simply log in and provide information on the sites buying and selling links for purposes of search engine manipulation. We'll review each report we get and use this feedback to improve our algorithms and improve our search results. in some cases we may also take individual action on sites.
If you are selling links for advertising purposes, there are many ways you can designate this, including:
- Adding a rel="nofollow" attribute to the href tag
- Redirecting the links to an intermediate page that is blocked from search engines with a robots.txt file
Father's Day is less than a week away, and graduation season is in full swing. To help you find gifts for your dads and grads, RitzCamera.com and BoatersWorld.com are offering 5% off Google Checkout orders until Sunday, June 17th (that's Father's Day). Just enter coupon code 'dad' after you click on the Google Checkout button.
Posted by Alden DeSoto, Google Analytics Team
Monday, June 11, 2007
This afternoon we experienced a brief outage, during which about half our users seemed to lose their subscriptions. This can happen when one of the many complex systems that power Google Reader experiences a glitch. We work hard to avoid problems of any kind, but occasionally something like this happens. The good news is that no data was actually lost, it was just temporarily inaccessible. Google's systems store data redundantly to minimize the chance of anything becoming permanently lost.
We were able to identify, diagnose, and fix today's outage within an hour, which is the kind of response time that we strive for. We'll continue give quick status updates to problems like this in the future so users who have trusted us with their data can feel comfortable doing so.
In addition to targeting malware, we're interested in combating phishing, a social engineering attack where criminals attempt to lure unsuspecting web surfers into logging into a fake website that looks like a real website, such as eBay, E-gold or an online bank. Following a successful attack, phishers can steal money out of the victims' accounts or take their identities. To protect our users against phishing, we publish a blacklist of known phishing sites. This blacklist is the basis for the anti-phishing features in the latest versions of Firefox and Google Desktop. Although blacklists are necessarily a step behind as phishers move their phishing pages around, blacklists have proved to be reasonably effective.
Not all phishing attacks target sites with obvious financial value. Beginning in mid-March, we detected a five-fold increase in overall phishing page views. It turned out that the phishing pages generating 95% of the new phishing traffic targeted MySpace, the popular social networking site. While a MySpace account does not have any intrinsic monetary value, phishers had come up with ways to monetize this attack. We observed hijacked accounts being used to spread bulletin board spam for some advertising revenue. According to this interview with a phisher, phishers also logged in to the email accounts of the profile owners to harvest financial account information. In any case, phishing MySpace became profitable enough (more than phishing more traditional targets) that many of the active phishers began targeting it.
Interestingly, the attack vector for this new attack appeared to be MySpace itself, rather than the usual email spam. To observe the phishers' actions, we fed them the login information for a dummy MySpace account. We saw that when phishers compromised a MySpace account, they added links to their phishing page on the stolen profile, which would in turn result in additional users getting compromised. Using a quirk of the CSS supported in MySpace profiles, the phishers injected these links invisibly as see-through images covering compromised profiles. Clicking anywhere on an infected profile, including on links that appeared normal, redirected the user to a phishing page. Here's a sample of some CSS code injected into the "About Me" section of an affected profile:
In addition to contributing to the viral growth of the phishing attack, linking directly off of real MySpace content added to the appearance of legitimacy of these phishing pages. In fact, we received thousands of complaints from confused users along the lines of "Why won't it let any of my friends look at my pictures?" regarding our warnings on these phishing pages, suggesting that even an explicit warning was not enough to protect many users. The effectiveness of the attack and the increasing sophistication of the phishing pages, some of which were hosted on botnets and were near perfect duplications of MySpace's login page, meant that we needed to switch tactics to combat this new threat.
In late March, we reached out to MySpace to see what we could do to help. We provided lists of the top phishing sites and our anti-phishing blacklist to MySpace so that they could disable compromised accounts with links to those sites. Unfortunately, many of the blocked users did not remove the phishing links when they reactivated their accounts, so the attacks continued to spread. On April 19, MySpace updated their server software so that they could disable bad links in users' profiles without requiring any user action or altering any other profile content. Overnight, overall phishing traffic dropped by a factor of five back to the levels observed in early March. While MySpace phishing continues at much lower volumes, phishers are beginning to move on to new targets.
Things you can do to help end phishing and Internet fraud
- Learn to recognize and avoid phishing. The Anti-Phishing Working Group has a good list of recommendations.
- Update your software regularly and run an anti-virus program. If a cyber-criminal gains control of your computer through a virus or a software security flaw, he doesn't need to resort to phishing to steal your information.
- Use different passwords on different sites and change them periodically. Phishers routinely try to log in to high-value targets, like online banking sites, with the passwords they steal for lower-value sites, like webmail and social networking services.
Have you ever wondered which of your favorite bands are coming to town? The Touring gadget, by Martin Mroz, makes finding out easy. You enter your location, and using a simple Google Desktop API and the music community website JamBase, the Touring gadget shows you which of your favorite bands are coming to your town soon.
Touring gathers your favorite bands by using the Google Desktop Query API. When Google Desktop indexes the user's files, it extracts metadata from music files and stores them. Touring queries for music files and pulls out the artist. It only takes a few lines of code to get this data.
Playing a game with your friends around the world isn't hard if the game uses the gadget GoogleTalk API. Multiplayer Reversi, by Turhan Aydin, illustrates this point and received lots of "oohs" and "ahhs" when Mihai presented it in San Jose. You select your friend to play with, they confirm, and you start to play reversi. If this sounds difficult, don't worry, it isn't: look at the code snippets.