opensource.google.com

Menu

Celebrating TensorFlow’s First Year

Wednesday, November 9, 2016

Originally posted on Google Research blog

It has been an eventful year since the Google Brain Team open-sourced TensorFlow to accelerate machine learning research and make technology work better for everyone. There has been an amazing amount of activity around the project: more than 480 people have contributed directly to TensorFlow, including Googlers, external researchers, independent programmers, students, and senior developers at other large companies. TensorFlow is now the most popular machine learning project on GitHub.


With more than 10,000 commits in just twelve months, we’ve made numerous performance improvements, added support for distributed training, brought TensorFlow to iOS and Raspberry Pi, and integrated TensorFlow with widely-used big data infrastructure. We’ve also made TensorFlow accessible from Go, Rust, and Haskell, released state-of-the-art image classification models – and answered thousands of questions on GitHub, StackOverflow, and the TensorFlow mailing list along the way.

At Google, TensorFlow supports everything from large-scale product features to exploratory research. We recently launched major improvements to Google Translate using TensorFlow (and Tensor Processing Units, which are special hardware accelerators for TensorFlow). Project Magenta is working on new reinforcement learning-based models that can produce melodies, and a visiting PhD student recently worked with the Google Brain team to build a TensorFlow model that can automatically interpolate between artistic styles. DeepMind has also decided to use TensorFlow to power all of their research – for example, they recently produced fascinating generative models of speech and music based on raw audio.

We’re especially excited to see how people all over the world are using TensorFlow. For example:

  • Australian marine biologists are using TensorFlow to find sea cows in tens of thousands of hi-res photos to better understand their populations, which are under threat of extinction. 
  • An enterprising Japanese cucumber farmer trained a model with TensorFlow to sort cucumbers by size, shape, and other characteristics.
  • Radiologists have adapted TensorFlow to identify signs of Parkinson’s disease in medical scans.
  • Data scientists in the Bay Area have rigged up TensorFlow and the Raspberry Pi to keep track of the Caltrain.

We’re committed to making sure TensorFlow scales all the way from research to production and from the tiniest Raspberry Pi all the way up to server farms filled with GPUs or TPUs. But TensorFlow is more than a single open-source project – we’re doing our best to foster an open-source ecosystem of related software and machine learning models around it:

  • The TensorFlow Serving project simplifies the process of serving TensorFlow models in production.
  • TensorFlow “Wide and Deep” models combine the strengths of traditional linear models and modern deep neural networks. 
  • For those who are interested in working with TensorFlow in the cloud, Google Cloud Platform recently launched Cloud Machine Learning, which offers TensorFlow as a managed service.

Furthermore, TensorFlow’s repository of models continues to grow with contributions from the community, with more than 3000 TensorFlow-related repositories are listed on GitHub alone! To participate in the TensorFlow community, you can follow our new Twitter account (@tensorflow), find us on GitHub, ask and answer questions on StackOverflow, and join the community discussion list.

Thanks very much to all of you who have already adopted TensorFlow in your cutting-edge products, your ambitious research, your fast-growing startups, and your school projects; special thanks to everyone who has contributed directly to the codebase. In collaboration with the global machine learning community, we look forward to making TensorFlow even better in the years to come!

By Zak Stone, Product Manager for TensorFlow

Google Summer of Code 2016 blog post round-up

Tuesday, November 8, 2016

We’re publishing guest posts from Google Summer of Code (GSoC) students, mentors and organizations every week and more are coming. Many have already written GSoC wrap-up posts on their own blogs, so we’ve rounded them up for you to explore.


Static types in Python, oh my(py)!” by Tim Abbott, org admin for Zulip
“We posted mypy annotations as one of our project ideas for Google Summer of Code (GSoC). We found an incredible student, Eklavya Sharma, for the project. Eklavya did the vast majority of the hard work of annotating Zulip. Amazingly, he also found the time during the summer to migrate Zulip to use virtualenvs and then upgrade Zulip to Python 3!”


A road from Google Summer of Code student to organization administrator” by Araz Abishov, org admin for HISP
“Google has created unprecedented opportunity both for young developers and open source communities, which I think everyone should take advantage of. GSoC is more than just a three months internship, and I hope that this post will be a good example of how it can change anyone’s life.”


Summer of Code 2016: Wrapping it up” by Martin Braun, org admin for GNU Radio
“This summer was a great summer in terms of student participation. All three students will be presenting their work (either in person, or via poster) at this year’s GNU Radio Conference in Boulder, Colorado.”


2016 Google Summer of Code Wrap-Up” by Ed Cable, org admin for Mifos Initiative
“Each year GSoC continues to unite and grow our community in different ways. Once again, we received incredibly valuable contributions to our Mifos X web and mobile clients this summer; most importantly we have cultivated numerous passionate contributors that will be a part of our community long into the future.”


Road to GSoC 2016” by Minh Chu, student who worked on Neverland for KDE
“I was nervous about choosing a project. So many projects and requirements! After many hours, I finally decided to write a proposal for KDE’s Neverland Theme Builder and was accepted.”


Git Rev News” by Christian Couder, mentor for Git
“Such performance improvements as well as the code consolidations around the sequencer are of course very nice. It is interesting and satisfying to see that they are the result of building on top of previous work over the years by GSoC students, mentors and reviewers.”

Elasticsearch Lua II" by Dhaval Kapil, student who worked with LabLua
“My GSoC project this year was entitled ‘Improve elasticsearch-lua tests and builds’ and was a continuation of the work that I had done last year. Apart from adding a test suite for elasticsearch-lua and making it robust, I also decided to work on the documentation of the code."

Google Summer of Code 2016 Conclusion” by Amine Khaldi, org admin for ReactOS
“Students stumble upon many of the same difficulties ReactOS' own senior developers encountered during their early days, including that ever painful but necessary step to using a proper debugger instead of relying on printf statements in the code.”


My Journey in Open Source / How to Get Started Contributing” by Nelson Liu, student who worked on scikit-learn for PSF
“The best way to get started is to simply jump in! There are a myriad of ways to contribute to an open source project. Obviously, writing code to fix bugs, add new features, or enhance existing ones are useful. However, you don't have to write code to help out!”

Google Summer of Code 2016 Student Projects” by Pankaj Nathani, org admin for BuildmLearn
“Many open source projects like ours really benefit from this initiative of Google. Not only do we get large number of university students interested to work on our projects during summer; we also gain new long term contributors and project maintainers."

Lasp and the Google Summer of Code” by Borja o’Cook, student who worked on Lasp for BEAM Community
“All in all, it's been an amazing experience. I've received a lot of support from my mentors and teammates; the Lasp team is full of incredible people.”


GSoC 2016 Students in TEAMMATES” by Damith C. Rajapakse, org admin for TEAMMATES
“We had our biggest batch of students (7 students) in GSoC 2016, selected from 93 proposals, and representing 4 countries and 4 universities, working on TEAMMATES (an online feedback management system for education) and related sub projects.”


User-friendly encryption now in Drupal 8!” by Colan Schwartz, mentor for Drupal
“There were several students interested in the topic, and wrote proposals to match. Talha Paracha's excellent proposal was accepted, and he began in earnest. With Adam Bergstein (nerdstein) and I mentoring him, Talha successfully worked through all phases of the project.”


GSoC with Shogun” by Sanuj Sharma, student who worked on Shogun
“This was an excellent learning experience for me and I got to work with people from different countries (UK, Russia, Singapore, Germany) and cultures. I highly recommend students to participate in Google Summer of Code by looking for projects that interest them because having open source experience is highly beneficial, especially for programmers.”


We have wrap-up posts coming out every week so stay tuned for more. If you’re interested in participating in Google Summer of Code 2017, you can find details here.

By Josh Simmons, Open Source Programs Office

Announcing the Google Code-in 2016 mentor organizations

Monday, November 7, 2016

We’re excited to introduce the 17 open source organizations that are participating as mentor organizations for Google Code-in 2016. The contest, now in its seventh year, gives 13-17 year old pre-university students the opportunity to learn under the guidance of mentors by using their skills on real world applications, that is, open source projects.

Google Code-in officially starts for students on November 28, but students are encouraged to learn about the mentor organizations ahead of time and can get started by clicking on the links below.


  • Apertium - rule-based machine translation platform
  • BRL-CAD - computer graphics, 2D and 3D geometry modeling, and computer-aided design (CAD)
  • CCExtractor - open source tools for subtitle generation
  • Copyleft Games - building game development platforms for tomorrow
  • Drupal - content management platform
  • FOSSASIA - developing communities across all ages and borders to form a better future with Open Technologies and ICT
  • Haiku - operating system specifically targeting personal computing
  • KDE - team that creates Free Software for desktop and portable computing
  • MetaBrainz - builds community maintained databases
  • Mifos Initiative - transforming the delivery of financial services to the poor and the unbanked
  • MovingBlocks - like an open source Minecraft
  • OpenMRS - open source medical records system for the world
  • SCoRe - research lab that seeks sustainable solutions for problems faced by developing countries
  • Sugar Labs - learning platform and activities for elementary education
  • Systers - community for women involved in the technical aspects of computing
  • Wikimedia - non-profit foundation dedicated to bringing free content to the world, operating Wikipedia
  • Zulip - powerful, threaded open source group chat with apps for every major platform
Mentor organizations are currently creating thousands of tasks for students covering code, documentation, user interface, quality assurance, outreach, research and training. The contest officially starts for students on Monday, November 28th at 9:00am PST.

You can learn more about Google Code-in on the contest site where you’ll find Contest Rules, Frequently Asked Questions and Important Dates. There you’ll also find flyers and other helpful information including the Getting Started Guide. Our discussion mailing list is a great way to talk with other students, mentors and organization administrators about the contest. For questions about eligibility or other general questions, you can contact us at [email protected].

By Josh Simmons, Open Source Programs Office

Podcast to YouTube: an open source story

Friday, November 4, 2016

Almost a year ago Mark Mandel and I started the Google Cloud Platform Podcast, a weekly podcast that covers topics related to Google Cloud Platform, among other things. It's been a pretty successful podcast, but that’s not what I want to write about today.

After a while we started receiving emails from listeners that wanted to access our podcast on YouTube. Even though this might seem strange for those that love podcasts and have their favorite app on their phones, we decided that the customer is always right: we should post every episode to YouTube.

Specifications

Ok, so … how? Well, to create a video I need to merge the mp3 audio from an episode with a static image. Let's include the title of the episode and the Google Cloud Platform Podcast logo.


But once we post the video to YouTube we're going to need more than that! We need a description, some tags, and probably a link to the episode (SEO FTW!).

Where can we get that information from? Let's think about this for a minute. Where are others getting this information from? The RSS feed! Would it be possible to create a tool to which I could say "post the video for episode 46" and a couple minutes later the video appeared on YouTube? That'd be awesome! Let's do that!

Architecture

The application I wrote parses an RSS feed and given the episodes to publish it downloads the metadata and audio for an episode, generates the corresponding videos, and pushes them to YouTube.
Diagram of the flow of data in podcast-to-youtube
The hardest parts here are the creation of the image and the video. The rest is sending HTTP requests right and left.

Image Maker: rendering images in pure Go

After trying a couple of different tools I decided that the easiest was to create the image from scratch in Go using the image package from the standard library and a freetype library available on GitHub.

Probably the most fun part was to be able to choose a font that would make the title fit the image correctly regardless of the length in characters. I ended up creating a loop that:
  • chooses a font and measures the width of the resulting text
  • if it's too wide, decreases the font size by one and repeats.
Surprisingly, for me, this is actually a pretty common practice!

It is also worth mentioning the way I test the package: Using a standard image that I compare to the one generated by the package, then showing a "diff" image where all the pixels that differ are highlighted in red.
Diff image generated when using a wrong DPI.
The code for this package is available here.

Video maker: ffmpeg is awesome

From the beginning I knew I would end up using ffmpeg to create my video. Why? Well, because it is as simple as running this command:

$ ffmpeg -i image.png -i audio.mp3 video.mp4

Easy right? Well, this is once ffmpeg has been installed and correctly configured, which is actually not that simple and would make this tool hard to install on any machine.

That's why the whole tool runs on Docker. Docker is a pretty widespread technology, and thanks to Makefile I'm able to provide a tool that can be run like this:

$ make run

Conclusion

It took me a couple of days to write the tool and get it to a point where I could open source it, but it was totally worth it. I know that others will be able to easily reuse it, or even extend it. Who knows, maybe this should be exposed as a web application so anyone can use it, no Docker or Makefile needed!

I am currently using this tool weekly to upload the Google Cloud Platform Podcast episodes to this playlist, and you can find the whole code on this GitHub repository.

Any questions? I'm @francesc on Twitter.

By Francesc Campoy, Developer Advocate

Cilium: Networking and security for containers with BPF and XDP

Wednesday, November 2, 2016

This is a guest post by Daniel Borkmann who was recently recognized through the Google Open Source Peer Bonus program for his work on the Cilium project. We invited Daniel to share his project on our blog.

Our open source project, called Cilium, started as an experiment for Linux container networking tackling four requirements:

  • Scale: How can we scale in terms of addressing and with regards to network policy?
  • Extensibility: Can we be as extensible as user space networking in the Linux kernel itself?
  • Simplicity: What is an appropriate abstraction away from traditional networking?
  • Performance: Do we sacrifice performance in the process of implementing the aforementioned aspects?

We realize these goals in Cilium with the help of eBPF. eBPF is an efficient and generic in-kernel bytecode engine, that allows for full programmability. There are many subsystems in the Linux kernel that utilize eBPF, mainly in the areas of networking, tracing and security.

eBPF can be attached to key ingress and egress points of the kernel's networking data path for every network device. As input, eBPF operates on the kernel's network packet representation and can thus access and mangle various kinds of data, redirect the packet to other devices, perform encapsulations, etc.

This is a typical workflow: eBPF is programmed in a subset of C, compiled with LLVM which contains an eBPF back-end. LLVM then generates an ELF file containing program code, specification for maps and related relocation data. In eBPF, maps are efficient key/value stores in the kernel that can be shared between various eBPF programs, but also between user space. Given the ELF file, tools like tc (traffic control) can parse its content and load the program into the kernel. Before the program is executed, the kernel verifies the eBPF bytecode in order to make sure that it cannot affect the kernel's stability (e.g. crash the kernel and out of bounds access) and always terminates, which requires programs to be free of loops. Once it passed verification, the program is JIT (just-in-time) compiled.

Today, architectures such as x86_64, arm64, ppc64 and s390 have the ability to compile a native opcode image out of an eBPF program, so that instead of an execution through an in-kernel eBPF interpreter, the resulting image can run natively like any other kernel code. tc then installs the program into the kernel's networking data path, and with a capable NIC, the program can also be offloaded entirely into the hardware.


Cilium acts as a middle layer, plugs into container runtimes and orchestrators such as Kubernetes, Docker or CNI, and can generate and atomically update eBPF programs on the fly without requiring a container to restart. Thus, unlike connection proxies, an update of the datapath does not cause connections to be dropped. These programs are specifically tailored and optimized for each container, for example, a feature that a particular container does not need can just be compiled out and the majority of configuration becomes constant, allowing LLVM for further optimizations.

We have many implemented building blocks in Cilium using eBPF, such as NAT64, L3/L4 load balancing with direct server return, a connection tracker, port mapping, access control, NDisc and ARP responder and integration with various encapsulations like VXLAN, Geneve and GRE, just to name a few. Since all these building blocks run in the Linux kernel and have a stable API, there is of course no need to cross kernel/user space boundary, which makes eBPF a perfectly suited and flexible technology for container networking.

One step further in that direction is XDP, which was recently merged into the Linux kernel and allows for DPDK-like performance for the kernel itself. The basic idea is that XDP is tightly coupled with eBPF and hooks into a very early ingress path at the driver layer, where it operates with direct access to the packet's DMA buffer.

This is effectively as low-level as it can get to reach near-optimal performance, which mainly allows for tailoring high-performance load balancers or routers with commodity hardware. One advantage that comes with XDP is also that it reuses the kernel's security model for accessing the device as opposed to user space based mechanisms. It doesn't require any third party modules and works in concert with the Linux kernel. Both XDP and tc with eBPF are complementary to each other, and constitute a bigger piece of the puzzle for Cilium itself.

If you’re curious, check out the Cilium code or demos on GitHub.


By Daniel Borkmann, Cilium contributor
.