All The Government’s Information

MALE SPEAKER: Welcome and
thanks for coming. I’d like to introduce Carl
Malamud who I’ve known now for more years than I suspect either
of us could count. And I’m proud to say that Carl
is my kind of troublemaker. He’s caused several interesting
bits of trouble over the years, and he’s here
to tell us about more. So with that, Carl Malamud. CARL MALAMUD: I guess I won’t
use a standard line I’m from Washington, and I’m
here to help you. Actually the line I use in
Washington is, I’m from the internet, and I’m here
to help you. I thought I would start and give
you ten minutes kind of a background just to kind of let
you know where I’m coming from and then talk about a new
project that I’ve been putting together for about a year now. For most of the ’80s, I did
professional reference books, and I consulted with
the government. I actually started working for
the Federal Reserve board and did their research network in
like about ’82 and about 1990 I kind of figured that DECnet
and Novell and all that stuff just wasn’t going to scale
and that this TCPI thing might do the job. And I’d been on a kick of
putting standards online. I just felt that if you’re going
to have a standard it ought to be readable by people
and so I had spent several years attacking ITU and ANCI
and ISO and doing the OSI TCP/IP fight. And about 1990, I got an
appointment with the secretary general of the ITU. And I flew to Geneva and went
and saw him, and said you know your standards ought
to be online. And his answer was you know we’d
love to put them online but you know it’s technically
impossible because you see our standards that we do are
published in a system that we developed in the ’70s, and
we lost the source code. It was a text formatting
system. And I was like well you know
maybe if you gave me some tapes so I could do something
with them and I’d give you a copy back of your stuff in a
more modern format and we’d put it on this little research
network the internet. And that’s kind of how he
referred to the internet as this little research network. And he figured ah no
harm there’s only a few people there. And so I went back to the
University of Colorado with my friend Mike Schwartz, who did
Harvest and stuff like that, and we printed out Octel dumps
of all their tapes and then we took the ITU blue book, and we
spread it on the floor and we started looking. And we said, well that looks
like an e, that might be a paragraph mark, we ran a little
program and translated it from the old format that they
had it in, which actually was using three different
character sets one of which they invented themselves. And we took this thing and we
turned it into troff, which was considered quite modern at
the time, and we released it on the internet. Two things happened. We got a call from the National
Science Foundation saying you’re using up all our
bandwidth on the international links because everybody’s making
copies of this thing. International links were 56
kilobits at the time. And a few weeks later, I got
a letter from the secretary general saying, well thank you
so much for the source code, we have decided to terminate
the experiment. Please retrieve all copies. I was like this is going to be
hard because there’s been a whole bunch made. I’ll take my copy offline and
I did, and the forty other mirrors were kind of there. I wrote a book about it called
Exploring the Internet, which kind of documented this
ITU thing and while I was doing it. I was traveling around the world
meeting folks that were building the internet at the
time people like Paul Vixie and Jun Murai in Japan and
wrote a book about it. And that kind of brought me to
the next phase of what I wanted to do which is I decided
to start a radio station on the internet. This was about ’93. I figured what the hell this
won’t be hard, we’ll just put audio files online. So we started a program called
Geek of the Week. And Geek of the Week ran
for about a year. There’s a point to this. I did this in Washington DC, and
we figured out since we’re doing Geek of the Week
we might as well do some more fun stuff. And so I went to the National
Press Club and said you know you’ve got an extra broadcast
booth, you’ve got one for C-SPAN, you’ve got one for
National Public Radio, could we have that extra one? And went to the governor’s and
finally convinced them that we’re allowed to put their stuff
out live on the net. And as part of that ended up
looking at government archives in Washington. And I think I spent most of
the last decade working on this issue of putting government
stuff online. So the radio station led to
a little demo on the hill. And when I was on the hill a
congressman came up to me and he said you know some Nader’s
Raiders sent me a letter, and said the SEC EDGAR database
ought to be online. Why isn’t it? So I said, I don’t know
I’ll go look it. So I looked at it, and I said
this doesn’t look that hard it could probably be done. And so I found myself in a
meeting with the congressional staff and the Securities and
Exchange Commission. And at which point I heard the
same answer I had heard from the ITU It’s like why should
we put the stuff online? It’d be technically
impossible. Well c’mon it can’t
be that hard. So we rigged up a little deal
with the National Science Foundation in which we took
money from American taxpayers, from the NSF, and we used it to
buy the data from the SEC, American data, in order to give
it back to people on the internet and run it for free. Which we did for about
two years. And at the end of two years put
a little sign up on the internet that said the service
will terminate in 60 days, click here to send mail to Newt
Gingrich, click here to send mail to Al Gore, click
here to send mail to the chairman of the SEC. Now he didn’t have email at the
time and so we kind of did a proxy form. We took all the email which was
about 15,000 protests and we printed them out, walked
them down to the SEC, and actually crashed one
of their meetings. And said this stuff ought
to be online. At which point we finally got a
meeting with SEC management, and they’re like well how
much would this cost? Well it cost us a couple hundred
grand a year it will cost you a million. And they’re like OK. Do people really
use this stuff? It’s like well, yes. This is not a bunch of fat cat
lawyers and financial types, these were senior citizens, and
journalists, and students. We were up to 70,000
80,000 people a day using these AFCC documents. And to the chairman’s credit,
he said well OK we should be doing this. But, unfortunately, our internet
connection isn’t working and I don’t think
we have a computer. They had a mainframe. And so we took a four processor
Sun and put it in a station wagon and drove it down
to the SEC and configured their T1 line for and got
him up and running. And that worked out real well. Now the patent office, on the
other hand, we had put the patent data online just to
prove that the SEC stuff wasn’t just like so simple,
it was a no brainer. So we put the patent data online
and, unfortunately, the patent commissioner decided that
the patents were a source of revenue not a way of
promoting the useful arts and sciences by letting people know
what these inventions were but a revenue source. And so he refused. And so about ’98 I sent a letter
to Al Gore, which said you know this stuff really
ought to be online and I didn’t come right out and
threaten but basically the gist of the letter was within
60 days if you guys don’t decide to do this we’re
going to do it. We’re gonna run it for
two years and then we’re gonna stop it. And so we started mirroring
the patent office, and I started making lots of calls
to the patent office saying well you know how much
exactly was it cost for all of the data? And if I wanted twenty years’
worth how much would that be? And when could you deliver it? And sure enough on day 59
Commissioner Lehman went to the American Bar Association and
explained that as part of the normal planning process they
had been undergoing they were going to be putting
this data online. The point of this is that,
as an individual, you can actually change the way
government does things. That is possible at times. Sometimes it’s hard. Sometimes they really don’t
want to do it and they go kicking and screaming, but
that’s kind of the business I’ve been in for the
last ten years. So I’m going to tell you one
more war story and then we’ll get to the topic at hand. I’m actually on a red eye
back this evening. At 6:30, I land at Dulles. I get in my car, go find my
suit, and drive up to the capitol to testify on the
issue of the Smithsonian Showtime contract. I don’t know if you guys have
been reading about this. But the Smithsonian decided to
cut a contract with Showtime, not release a contract. And the deal, as we understand
it, is that if you want to make a film using the
Smithsonian collections or access to staff, you have
to give Showtime a right of first refusal. So I read an article on
Boeing-Boeing which talked about an article in the New York
Times that said Ken Burns was upset by this, and that the
contract was secret and they were refusing
to divulge it. So I thought ah come
on this is silly. So I sent a FOIA request in and
that got me sucked into the middle of this little
Smithsonian Showtime fight. The story gets worse. It turns out this contract
is a 30 year contract signed this year. Now think about this, 30 years
ago, 1976, if you had signed a 30 year contract for the
future of things. 1976 there were about 200
computers on the internet, WTBS had just started going up
to satellite, cable industry was just starting to happen,
C-SPAN didn’t exist, and the next hot thing that
was just about to happen was the fax machine. Faxes were big in Japan they
were just starting to happen in the US. So just imagine a 30 year
contract in ’76. We also found out, through an
anonymous tip to EFF, who was representing me in the FOIA
suit, that surprise, surprise web masters at the Smithsonian
are being told to take audio and video off the public web
sites because it conflicts with the non-compete clause
of the Showtime contract. So I’ve been called
up to the hill. They were going to do hearings
with the secretary anyway, but they’ve decided that a public
panel is in order. I got appointed for my testimony
and was appointed to some other things that
are out there. I just wanted to give you a
little context kind of where I’m coming from. I’m from Washington, but I’m
kind of from the outside part of Washington. I work for a think tank called
the Center for American Progress and we’re kind of a
liberal, nonpartisan, left wing, think tank. I have to be careful
we’re 501(c)(3). So about a year and a half ago
when I got to Washington and joined the center, I got called
to my boss by, who was John Podesta used to be
Clinton’s chief of staff, and he’s like IPTV is really big,
we should do something. What should we do? And I thought to myself, and
this is last February I guess, and I thought to myself you know
there’s something that’s always really bugged me. And what bugs me is when you go
to Washington and you go to a congressional hearing or you
go to an executive agency briefing, you really need to
be in Washington to observe the way our government works. Now you know there’s some web
casting going on by a few congressional committees. There are some other things that
are occasionally online. C-SPAN does a great job
on the stuff they do. But they’re three channels
they can only do so much. And there is probably 10 or 15
events happening every day in Washington and there is
no permanent archive. When I’m doing research and I
want to go back and see a hearing, I want to see what
so and so had to say about patents or about this or about
that, that information has typically disappeared. It’s just not there. So I started digging into how do
you get access to hearings? AUDIENCE: Aren’t there
not stenographers? CARL MALAMUD: There are
stenographers, the video and the audio aren’t there. Even with the stenographers,
even with the transcripts, not all of those are available. There are proprietary
services that make a lot of that available. Some of it shows up many
months later from the congress, but it’s not like
gee something interesting happened yesterday I definitely
will be able to find it by searching
for it in Google. It’s just a hit or miss thing. So the way it works in
Washington if you are a congressional committee, or
executive briefings, the Federal Communications
Commission does hearings, patent office does them,
all sorts of folks do these things. There’s two ways you get your
audio and video out. You do it yourself. And the way that typically
works is a congressional committee gets their IT dude
to buy a camera and the IT dude goes to the back of the
room and does the camera thing, nothing as sophisticated
as Google video, and then they go to some
commercial web casting company and they stream it
out typically in real audio, real video. The other way you do it is you
let the media do it for you. And there’s some interesting
things that have happened to the media market
in Washington. It used to be if you’re going to
do a hearing live, what you did is you hired a satellite
truck and you parked it out front, and then you ran your
cables in and you hooked up your camera. And that’s how you got your
video out of the hearing rooms. Verizon has put together a
service called the Audiovisual Operations Center, the AVOC. And they have fiber going
out to most of these hearing rooms now. And so if your major media, what
you do is you go to the AVOC, and this is just a
dedicated video over fiber switch, so this is using the SDI
standard 280 megabits per second standard broadcast
quality video on top of fiber. All right? So it’s not TCP/IP or
anything like that. What you do is you lease one
line, you lease a switch at AVOC, you lease a line out to
the capitol or the state department or the White House
or whatever you’re trying to cover, you lease another line
back to where you are and then you tell the switch, connect
those two things together and you get your video. And this infrastructure
is fairly extensive. And so if your major media
that’s what you do. There’s a little wrinkle
for the congress. Because the congress has so
many hearing rooms, the correspondent’s gallery
has put together a fiber optic project. And so you don’t actually lease
fiber from the AVOC switch all the way to the
hearing room you run your fiber into the basement of the
capitol and then use the existing fiber infrastructure on
the hill to get back out to the various hearing rooms. Not
all of them are wired now, but most of them are. So what I’m trying to put
together is a little system that provides what I call
the Washington bridge. And the Washington bridge is a
gateway between the hearing rooms in Washington and either
the real world of the internet depending on who you’re
talking to. And the idea here is to get 16
of these rooms up and running, stream the data out live to
the net, and archive it permanently. Now there’s a couple twists
involved in this that make it interesting. To stream it out live to the
net, what I’m looking at is what I call a wholesale
retail model. So we get all this video
coming in and it’s 280 megabits per second. And you probably don’t want to
feed 16 280 megabit per second streams out on the internet
on the public peering infrastructure. But you can convert that stuff
into either MPEG-2 at 50 megabits per second, which is
really high quality stuff, or MPEG-4 even at eight megabits
per second is still pretty high quality stuff. And there’s no reason why you
can’t take all these 16 streams, convert them to MPEG
over TCP/IP, run it out to one of the large co-location
facilities, such as Equinex or PACS, and make it available to
people like Google Video, people like NBC, to any other
folks that might want to take this data and do something
with it. In your case, there’s
lots of stuff you’d probably do with it. And so this service is intended
to provide the highest quality feeds we
possibly can to folks like Google Video and NBC and the
others and also provide somewhat of a retail operation
if you will. And by retail I don’t mean
charging money, but I mean direct service to end users. And the reason for doing that
instead of just relying on Google Video and NBC and
everybody else to kind of provision this and do it
properly, is because I’d like to view this system
as somewhat of a research test bed. Right? As something that
we can do some interesting work on top of. So I’m working with some folks
from Cisco there are very interested in OK how do we sign
all these video streams to make sure that there’s some
authenticity in there if people change it. Some folks at Sun are very
interested in automatic generation of metadata, anything
from speech to text, which as you know is a little
iffy these days, to other things video scene
identification and there’s a whole bunch of stuff
you can do. So how do you put something like
this together and roll it out the door? What we’re doing is I’m in the
process of creating a new 501(c)(3) nonprofit and it’s
called the Public Memory Trust and it has a mission of building
public works projects on the internet. Project number one is
a Washington bridge. It’s got a pretty simple
business model, and it’s one that I’ve used all of my life
which is try to get companies like Sun and Cisco to give me
a boat load of hardware. And so I’m talking to both
companies at pretty high levels, these are up at the
executive committee level they’re very interested. Both companies have some
really good engineers working with me. Cisco is easy because there is
a whole bunch of really good routing people there,
and they’re very interested in this. Sun is easy as well because
we’re talking a lot of storage share. And in the long run, we’re
asking Sun for a petabyte a disc. And in the long run, I would
like to see every library in the country have a petabyte a
disc and a permanent archive of all congressional
proceedings. But you’ve got to get
this stuff started. So the business model is part, I
won’t say free hardware, but hardware donated by companies
that want to see something like this happen for a variety
of reasons, and part cash. And I’m talking to a variety
of potential cash sponsors. I’ve also talked to the
government printing office, and they’re very interested
in this. And I actually went to see Bruce
James who is the public printer of the United States,
which is an appointee of the President and he is actually the
official service provider for all three branches
of government. He does congress, the judiciary,
and the executive. And I said in the long run you
need to be doing this, right? This is your job we’re not
be doing this forever. But we’d like to run it for
three or four years and prove that it can happen. And I was actually surprised
by the reception I got, the reception was very positive. It was like this is good. We’re not going to be able
to do this right away. You can run a lot faster
than we can. This is a good thing we’d
like to support you. And so we kind of how do you
do this because you don’t really want to start a nonprofit
and immediately take a lot of government funding
because there’s a lot of strings attached. They’re going to want a full
spec ahead of time. And so we came up with what
I think is a great way of doing things. Which is they have a Fellows
program in which some of their young, bright staff go out to
the agencies for a year. And they’ve agreed to give us a
GPO fellow which is prepaid staff for us, free training for
them, and most importantly a linkage with the government. It’s what they call
a public, private partnership these days. And so that means we’re going to
have two routes to be able to walk in to an agency or to
the congress and say hi we want to get our cameras
in here and get this video online. One is to go in as media the
other is to simply go in as somebody helping the agencies
get up and running. And so that’s the kind of two
different paths that we’re going to be able to go
in on this thing. And so it’s a new project. I’m here sort of talking about
vaporware, but it’s vaporware I’ve been working
on for a year. I think it looks like this
thing is going to happen. We’re in the process of writing
the articles of incorporation. I’ve had a lot of talks
with a variety of people within Google. Eric Schmidt used to be on the
board of directors of our radio station way back when, and
Vince Surf was also on my board of directors. And so I’ve kind of pitched the
project to them, and they were like this is interesting,
go talk to the engineers. And so I called Steve up and
said can I do a tech talk? And that’s what brings
me here. So that’s kind of the
prepared part of what I’m talking about. But I really just wanted to
see if you folks had some questions or some interesting
tweaks on this, interesting research projects we can do. There’s some feasibility
questions, how do you engineer this? It turns out doing data centers
in Washington DC these days is really hard. All the CoLOS are really either
at capacity or doing a bad job one of the two. And so there’s some real
challenges in engineering something like this. But I haven’t seen any
show stoppers yet. This doesn’t seem, at least
technically, like a hard thing to do. Certainly for a three or four
year kind of roll this out, make it work, and then try to
get the government to take the service over and do it
themselves the way they ought to be doing it. So that’s a basic outline. If you look at my links, I
have a copy of this slide available at public.resource.
org/google.techtalk.html and I’ve got my email address
[email protected] feel free to send me email. You’ll find at the bottom
there’s a link to the business plan which is a very large pdf
document and kind of goes over this in great detail including
the finances. Questions? AUDIENCE: So are there now
cameras in most of those rooms that are hooked to the fiber? CARL MALAMUD: The question
is are there cameras already in the rooms? So that’s actually an
interesting question. Some of the committees have
feeds sometimes the media is already in there, it’s spotty. And so our strategy is a
three fold strategy. One is to find as many hearing
rooms that already have cameras and put splitters on
them, with the permission of the committees and the agencies,
and take those feeds and run them out. So that’s part one. Part two is to get credentials
as Real Media. And I actually had credentials
for internet talk radio with the congressional galleries. And so to get those credentials
and be able to participate in the pool feed
operation in which one camera is in there, but everybody
gets a copy of it. And then the third strategy is
to buy ten of these $3,000 cameras that are now coming
out and get our staff of twelve, is what we project to
be the staffing level with this operation, and send them
in to mop up all the stuff that isn’t there. And we’re not going to be able
to do the whole government, but I think we can do 16
simultaneous feeds using those three strategies. Other questions? Yes? AUDIENCE: [INAUDIBLE]? CARL MALAMUD: Whose ox is this
going to gore is the question. That’s an interesting
question. You know it’s very hard for a
congressman to be against transparency at least
on the record. You might think that GPO would
not like this, government printing office, cause were
kind of going in and doing their job, and you know doing
it as a freelance nonprofit independent operation, but they
seem to be pretty good about this. Major Media might not like this
because right now it’s pretty much Major Media
that does at least the important coverage. And what’s going to happen,
I’m pretty sure, is we’re going to be someplace
that they aren’t. And that means that they’re
going to have to get their video from us in order to cover
the news which they’ll be able to do free. All this data, by the
way, is free. We’re not charging anybody
for the feeds. We’re funding the project
on a whole basis. So that was pretty important to
me that we not try to do a cost recovery because then we’d
end up negotiating with Verizon’s IPTV group and it just
seemed way too difficult. Major Media might not like it
because they may find end users going to Google Video
instead of waiting for NBC for their nightly news. And I’m expecting a lot of
people to start mashing up this stuff. Not everybody’s going to look
at the whole hearing and annotate it and intensively
work with it. But I think there’s enough
bloggers out there and other folks that are going to take
it, find the relevant snippets, slap it out
on wherever they are going to put it. And so I think there’s great
potential for large viewership on at least snippets
of these pieces. Yes? AUDIENCE: [INAUDIBLE]? CARL MALAMUD: The question
is where are you going to get the metadata? That’s actually a
difficult one. For example, you really want
transcriptions on these things if you’re doing them
properly, right? We’re not going to have
the staff or the budget to do that. And so there’s really
three techniques I think for doing metadata. One is we will do the best we
can to get as much in kind of a broadcast management framework
time, location, speakers, scan in
any paperwork. Number two is we’re going to do
as much automated stuff as we can, and that’s the research
test bed kind of aspect of this. Number three is allowing the net
to do as much annotation as possible. Because I believe that there’s
people interested in anything you put online particularly if
it’s the congress or it’s the executive branch, various
interest groups. One idea is to try to marry XMPP
Jabber protocols to the video streams. I’ve been told
that that’s hard do, but that’s a possible approach. And there’s actually a fourth
answer to this which is we’re going to do the best we can, on
the other hand, we’re going to hand it over to folks like
Google Video assuming you want this data, and presumably you’ll
do a much better job then we will because you run
a production operation. So I guess that’s answer
number four. So the real answer is we’ll do
the best we can, we’re hoping that the net will do intensive
annotation. and that a lot of the metadata
will very quickly over time accumulate, therefore, allowing
us to provide various search facilities and locate
what we’re looking for and things like that. A lot of people are counting on
speech to text, but I’m not sure I’d bet the farm
on that one yet. AUDIENCE: Do you plan to do
anything to facilitate the annotations so that it’s
there [INAUDIBLE]? CARL MALAMUD: The question is
are we going to do anything to facilitate annotation? We’re doing everything we can
with the staff of 12 and the people around us working
on the project. The way we’ve structured this,
like I said, is a 501(c)(3) a public trust. And we’re setting
up what I call the council of public engineers
I kind of like that name. And basically each of the
sponsors is throwing three people in. So Cisco’s got a Cisco fellow, a
distinguished engineer and a team leader, Sun’s got three
really good people one of the guys that did the honey comb
storage architecture and the hope is that by drawing on these
advisers and the staff and other volunteers and people
interested we’ll be able to do something. But I don’t want to wave my
hands too much and say gosh we’re going to provide
broadcast quality 16 channels a day. I mean this could be 50 hours
of video every day. That’s likely to happen
once were up to speed. You had a question? AUDIENCE: [INAUDIBLE] if Fed stenographers going, any
chance of getting a real time feed from there? CARL MALAMUD: The question is
can we get a real time feed from the stenographers? You bet I’m gonna try. Typically there’s two
different kinds of stenographers there are the ones
that work for one of the commercial services
like FedNet that sell this stuff, right? To lobbyists that need
that real time transcript right away. And then there’s a stenographer
for the agency or the committee that is
keeping records. Typically, what happens there
is they do their stenography and then it goes off into the
committee process and three months or six months later
it comes back out. I will certainly go to them and
say gee if we got a copy of what you are doing we could
put it online it’d be great. You would save on your
web casting charges. Because right now these
committees go retail to the extent they do web casting, and
they hire some streaming media company and if they have a
hit on their hands it breaks their budget. And so I think our proposition
is pretty attractive. Which is we’ll put your stuff
out in high res, as well as, lower res transformations
for people. And I think that’s an attractive
proposition. My hope is that all if we can
convince one committee chairman that it’s time to
upgrade their cameras and make him decent instead of the $50.00
web cams and that maybe they ought to give us a copy of
the stenographer’s stuff, I think every one of the other
committees would immediately begin doing the same thing. And so it’s a question of can
you convince the first couple that gee they would look
a lot better if they bought better cameras. They would look a lot better
if the metadata were immediately available. Steven? AUDIENCE: So I’m heartened by
your mention of FedNet. Because one of the themes
in the [INAUDIBLE] Smithsonian stuff is taking an
intermediary and eliminating or minimizing their influence
over the information and the terms under which it’s
distributed. So in this case it initially
seems like there was no intermediary. But now that you mentioned
one, what’s their deal? What other intermediaries
exist? To what extent do you think this
will change or eliminate the viability of their business
model and since you mentioned lobbying what
impact do you expect this to have on lobbying? CARL MALAMUD: Well there’s a
couple questions in there. So how’s this going to affect
people like FedNet? The intermediaries
are of two sorts. There’s the specialized
Washington insider intermediaries or the I
provide service to the committee on a commercial
basis intermediaries. How is this going to affect
their business model? I don’t know. I don’t care. It’s probably not going
to be good. The other intermediaries are
major media, because that’s the ones are really going
to be affected by this. Because right now it’s the
lobbyists that get that stuff real time and then if NBC or
C-SPAN decides it’s worthy of your attention and maybe it’s
available, maybe it’s available now but it’s
gone in 30 days. And so I think this will really
have an affect on the journalists and particularly
the major media operations. I think it’s going to have
some affect on the congressional committees and
the agencies because what’s going to start happening is
people are going to watch their stuff live, they’re going
to be commenting in Google Talk chat rooms live, and
that means that the staff is going to have to monitor
those chat rooms, and it’s going to be a different
dynamic. And one of my hopes is that we
get that feedback cycle going right back in to a lot
of these hearings. So that if somebody says
something silly, the staffer gets email or looks on a chat
room leans over to the congressperson and says
that was wrong. And then the congressperson
looks at the witness and says well as I understand it what
you said wasn’t right. And so hopefully, they’ll be
more of a feedback loop there. Question in back? AUDIENCE: So you’ve
been very polite. You’ve sort of said, gosh, I’d
like to know what you guys are working, what are the
interesting things that might map up with this. So more ruthlessly, what’s
your fantasy scenario? What should I do to help? You’ve got a bunch of people
that really would like to be able to do something this way. So what would be a couple
of hard items that we could achieve? CARL MALAMUD: Well there’s
a couple things. Vince Surf said he’d be happy to
join the board of directors of this new– AUDIENCE: Could you repeat
the question? CARL MALAMUD: Oh, I’m sorry. The question is what is my
fantasy scenario about what Google might do to help
this project happen? So Vince Surf said that
he’d be happy to join the board of directors. He’s a busy guy,
but he’s good. And I’ve worked with him in the
past, I really like Them Steve Wolff would be
joining from Cisco. He used to run the
NSFNET program. And so we’re looking at
a pretty good board of directors here. There’s this council of public
engineers, so it’d be really nice if three people from
Google, people like Stephen or others said gee I’m happy to be
an adviser to this project. Cash would also be nice. AUDIENCE: And in terms of
product related stuff– CARL MALAMUD: Yes. AUDIENCE: –how should people
be able to search for what [INAUDIBLE]? CARL MALAMUD: The question is
what can Google do to help on the kind of product
related stuff? I think there’s a few
answers to that. I’m really impressed with not
only the Google Talk team but the Google Video team and also
you’re engineering expertise in building big nets and data
centers and things like that. and I do think we’d get a lot
of really good advice. I also think there’s a lot of
interesting things we can do on our data that you might not
want to like roll out in production right away. So it might be an interesting
test bed. Again, my kind of fantasy idea
of taking the XMPP stuff and marrying it with the video
streaming or if not XMPP some other mechanism of allowing
the net to do intensive annotation of the streams not
only as they occur but also after the fact. Because I think one of
the things you can do is stream live. But then if you’re able to have
that permanent archive for anybody to use maybe
somebody goes in a little bit later a little bit later after
that and adds more information and the stuff becomes more
useful over time. And so really working with
people is really what I’m looking for in my fantasy
scenarios. That people get interested
enough in this that they want to participate with us at the
Public Memory Trust, but also that you folks are ingesting
this video and that you find it useful for your
own products. Question? AUDIENCE: Is part of the
project to provide a centralized annotation, kind
of like a Wikipedia type [INAUDIBLE] or is it more distributed
in your mind? CARL MALAMUD: The question
is do you have kind of a centralized annotation facility
or do you just let Google Video do one annotation,
we do one, somebody else does one? I think that’s a research
question because I’m not sure how I would do– it has to be distributed,
right? I mean certainly with XMPP you
have distributed servers. And so it’s got to be
distributed particularly in institutional boundaries here. Is there a way of setting up a
common URL scheme, a common stamping of the data so we can
know what we’re talking about? Is there some steganography or
water marking or something we can do on there? And then is there some mechanism
we can use that allows you to create lots of
value on the data, participate in a global whole, and still
keep your business interests? And I think that’s one of
the things that we would want to work on. The core thing, streaming
video out, I don’t think is hard. There’s some difficult
engineering. We’ve got to configure
the routers, right? We’ve got to make sure were
peering properly, we’ve got the proper transit, we’ve picked
the right formats. But I think this is all
straightforward, it’s hard work, but straightforward. The how do you add value on
the data particularly that much data and particularly with
potentially very large audiences I think is something
we all struggle with. And that’s one reason I’m
attracted to this because I wouldn’t mind spending the next
few years working on that stuff because I think it
would be interesting. AUDIENCE: I’m going to disagree
respectfully– CARL MALAMUD: OK. AUDIENCE: –that video
distribution is merely going to require hard work. And that’s because I think that
to get the penetration that you want where you’ve got
it such that there’s an active feedback, I think that requires
a sufficient number of viewers, that you’ve felt
this community where people were commenting and wanting to
be heard by the staffer who’s sitting and able to whisper in
a Congressman’s ear to, if you’ll pardon the phrase, call
bullshit on the witness. And that smells like
multicast to me. CARL MALAMUD: Yeah. So Steven’s point is that the
engineering on the video distribution is not necessarily
a no brainer. I think I agree with him. So let me clarify my
previous comment. I think it’s hard work. And one reason I think this is
doable is because there are people like Steven interested
in this, people like Randy Bush who does IIJ, Hank Kilmer
and Andrew Partan actually helped me do the initial
engineering on kind of the sample configs. Hank ran Sprint’s backbone, and
Andrew did UUNET and then went and did Verio. And so I understand the routing
issues are difficult, and the bandwidth provisioning
issues are difficult, but I think there is great interest
among the routing people on solving this problem
and doing it. And so it’s an issue I’m less
worried about because I know the caliber of the people that
are interested in the problem. AUDIENCE: [INAUDIBLE] service providers. To what extent do you think that
this content might be one of the things that brings
multicast to a point where service providers actually
want to deploy it? CARL MALAMUD: Question is would
this make multicasting like something that people
actually do? That would be great. Internet2 is multicast enabled
and so it’s kind of a no brainer to feed it on the
internet, to send it out there, that gets it out to
all the universities. Now whether, they’re multicast
enabled inside of the campus networks that’s another story. But you know the big ones
pretty much are. That’s actually pretty decent
infrastructure. We’ve been going since gosh at
least 1990 on the ISP should do multicast and so far at
least it hasn’t happened. And I agree it’s a proper
way to distribute this kind of stuff. And a lot of people do multicast
but then there’s also kind of the streaming– people want to see it when they
want instead of when it’s actually there. There’s some interesting
protocols that dynamically build multicast tunnels to the
edges that people like Juniper and Cisco have been working on
which is a possibility and some people think that might
be a carrot and a stick. The way this works is when you
ask for a multicast address instead of your whatever video
player simply saying we don’t have multicast go away, you use
a little plug in on your PC, it goes to the router and
says I want this multicast address, the router uses any
cast to go to a central directory, then dynamically
builds a tunnel. And so the theory here is that
if enough people are going in and dynamically building
tunnels over the ISP’s backbone, at some point they
say OK, OK, OK, we’ll do multicast. I’m a little
skeptical. I did a lot of MBone
broadcasting, multicasting whatever, long ago, and I’ve
been watching the multicast stuff and it’s been terribly
frustrating that it hasn’t like moved into the core
infrastructure. And you know there’s a lot
of reasons for that. A lot of the ops guys are just
like this is too difficult it’s too hard. I can’t train people to provision the routers properly. It would be great if this ended
up being a carrot and stick that helped
move that along. AUDIENCE: Have you guys given
thought to some of the other distribution methods like
peer-to-peer type application, where you have an application
that somebody downloads and it makes the distribution on
the video a lot easier? CARL MALAMUD: The question is
have you looked at other distribution mechanisms
like peer to peer? Certainly BitTorrent service on
the entire archive is a no brainer, right? That’s like just do it. I think they’ll be a lot
of other things that we can do, as well. And yes, I would like to see
that particularly for the more popular things. When something hits the
blogosphere and everybody gets all concerned, I would like to
figure out a way to get that stuff out really quickly to
folks, and to get the bigger stuff out there. Because, you know, a lot of
people are getting real frustrated with 240 by 340
stuff, and the lines are getting faster. And people are getting able to
be able to digest bigger things, and people are getting
smart enough that they’re willing to say well start
downloading now I’ll come back in the morning. And so it is possible to begin
thinking about gigabyte distribution of video or
hopefully at some point in the future we can even start going
up to the full res stuff. I do a lot of work on the
high-def, which is 160 gigabytes per hour uncompressed,
and you can compress it down to eight to
16 gigabytes per hour when you’re doing a variety of
compression techniques. But ultimately, that’s still
a little too big for folks. And so the peer to peer stuff
is one way to get that stuff out there at least in chunks
and get it out to people. Any other questions? Got one here and
then one here. AUDIENCE: So how are you going
to deal with the simple logistics of when do meetings
start and things like that. Why is the camera pointed
off at a weird angle. It seems like you need somebody
to look at each feed at least for a minute when it
starts for things like this. Are you going to have a staff
to do that or how do you envision that will be? CARL MALAMUD: Question is how
do you do the basic quality control of making sure
the cameras not pointed out the window? We’re looking at 12 staff
an executive director, a facilities manager, five network
engineers, five video engineers, and one of the job
requirements is everybody needs to be able to
run a camera. We’re going to do some simple
broadcast management stuff. I don’t want to spend the big
bucks on the cable head and fancy stuff. But yeah, you’re right. We are going to have to
sit there and monitor. Now the good news is Washington
starts at 9:00 and stops at 5:00. And so I think it is possible
to intensively monitor what your doing and do a pretty
good job on it. But people will be running fast.
We are going to look for an executive director that
understands that part of his or her job is going to be
holding that camera a couple hours a day, in addition to, the
business planning and the board briefings and whatever
else they do. Peter. AUDIENCE: Are you involved
with C-SPAN? Do they care what
you’re doing? Do they support it? CARL MALAMUD: The question
is, is C-SPAN supportive? C-SPAN has– I went and talked to them
and said here’s what I’m thinking of doing. I don’t want you to think this
is competitive and so I have met with the founding
chairman. I’ve talked intensively with
one of their executive vice presidents, their chief
operating officer. They’re extremely supportive
whether they’ll participate or not I don’t know, but they
don’t view this as competitive. One of the issues that they
learned with the Washington correspondents’ dinner and the
Colbert speech is that a lot of people on the net think their
stuff is public domain. Well it isn’t public domain. And they have very strong
feelings about things like events being viewed in their
entirety, they don’t want people using their stuff
for political ads. And so they view it as a very
positive thing that there would be a large public domain
archive of stuff that they can point people to. They also view this as a
potential source of video, because they only have so many
bodies they can only be in so many places, and like I said, no
matter what you cover there might be something happening in
another room which ends up being newsworthy. And so I do think in the long
run Major Media is going to be forced to deal with this service
because they’re just not going to want to miss in
case something does happen. Yes? AUDIENCE: Does your scope
include the floor of the House and Senate or anything
like that as well? Or just [INAUDIBLE]? CARL MALAMUD: Question is does
this include the floor of the House and Senate? The good news is the floor of
the House and Senate is very easy to do. It’s done basically by
the congress with C-SPAN acting as operator. That stuff is public domain if
you’re a member of the gallery you’re able to do that. And like I said, I’ve previously
had broadcast credentials with the gallery. I actually put the floor of
the House and Senate audio online in ’93, and we had a
cool little search engine working on that. We had a guy from MIT come
down and do speaker ID. It turned out it wasn’t hard
to do for the floor because there’s a limited number
of people. We brought in the congressional
record, coupled the speaker ID together with a
little clever programming that he did, and so we actually had
a search engine where you could say I want to hear all
democrats from Minnesota last week who spoke about the budget
and you would pull up the audio as well as
the transcript. Now the thing that I like the
most about this service was the negative query which was
show me all audio that does not have corresponding text. Because congressman are allowed
to revise and extend their remarks and often they’ll
go in and just like leave this stuff out of the
congressional record. And with our search engine you
can actually find, in one search, all of those speeches. Showed that to some senators
and they were like oh. That was Deb Roy who did
the programming. He is now professor
at the Media Lab. Very very good at audio and
video processing techniques, he does some really
cool stuff. Steven. AUDIENCE: I feel like opening
another can of worms. CARL MALAMUD: Another
can of worms. AUDIENCE: Let’s say you’re
successful [INAUDIBLE]. And you’re successful sooner in
a more widespread way than you could even imagine
standing here. And there’s infrastructure to
distribute video so that every broadband subscriber in America
can watch whatever meeting they want to. Let’s say further that this
whole net neutrality falls in the favor of the people who
control those broadband lines, being able to prioritize
whatever they want. And my further [INAUDIBLE] that there’s some truly
interesting content coming out of Washington in the next
couple years, say an impeachment hearing, and that’s
deemed to conflict with the internet release
of Terminator 4. What happens? CARL MALAMUD: I sue somebody. The question is so
what happens we get a success disaster? This is great everybody loves
our stuff somebody, Comcast whoever, decides that the
impeachment hearings are not as important as the imminent
release of Terminator 4, what do we do? There’s some real first
amendment issues right there, and the congress is not going to
be happy if their stuff is not reaching end users. I think that would certainly
illustrate the net neutrality point pretty well. Now, on the other hand, as you
know net neutrality can be two different things. It can be I’m going to cut off
your video or your voice over IP or whatever because it’s not
my service and I’m selling my own, or it can simply be I
will provision my service better than your service. And that’s kind of a shades
of gray issue there. If they cut off the impeachment
hearings, I think there’s a huge issue and
that’s pretty simple. You go to Boeing Boeing you tell
them what’s happening, you write some letters to the
capitol, and let the machinery go into action. If it’s degraded, and theirs is
a little better, and your stuff isn’t getting through,
then I think it’s a more difficult issue. That’s a real issue with the net
neutrality thing is how do you legislate fairness? It’s a real tough thing because
as you know Steven, whenever you engineer the
net you’re always making trade offs. And the position you’re getting
out of the Comcasts and the Verizons is well you
know this is just reasonable, normal engineering
we’re doing. But what tends to happen is
they go over the edge. They say well OK just because
we’re doing a product we’re gonna cut off all this
other stuff. And then all of a sudden
it becomes a real issue for folks. There’s another follow
up on what if you have a success disaster? I hope that at some point, like
I said, the Library of Congress and the libraries of
the world have permanent archives as does Google Video
and a bunch of other people, I hope all these committees have
got beautiful cameras and they’re sending stuff out, and
the government printing office’s got some kind of DC
fiber infrastructure for getting this stuff out to the
net in a clueful fashion. What do we do then? Well this model of putting every
public briefing online in a place can be replicated
at the state level, at the local level, in other
capitols. And so I view our mission is
putting ourselves out of business as quickly as we can in
Washington and selling this bridge back to the government
or getting them to take it over, and then taking whatever
money we have left or other money we’re able to raise and
going out and basically trying to teach other people how to do
the same thing at the state level, the local level, and
in other countries. So replication of this, I think,
is a useful thing. Washington’s a great place to
start because there’s so much content there and the
intellectual property issues are really well defined. This stuff is public domain
no matter what. It’s not quite as clear in a
few other instances in some states the infrastructure isn’t
quite as developed. You know things like the Verizon
AVOC isn’t there, and the fiber infrastructure
isn’t there. But I do think in the long
run, if there is a public proceeding happening I think
the public needs to have access to it both
now and forever. If it’s a public proceeding,
it needs to be archived. Other questions? My radio station, by the way,
when I ran it was actually called Internet Talk Radio. But I got a call from the New
York Times about 1994 and they said, I was putting the congress
online and they were writing an article, and they
said what’s the name of your radio station again? I said it’s RT-FM OK. I got a call back the next day
from the editor going what did that stand for again? I was like I don’t believe
this it’s radio technology for mankind. And sure enough they
printed it. And so I’ve been doing business under RT-FM for awhile. I’ve got some RT-FM stickers
up here that are left over. I crashed the United Nations
summit in Tunisia as a member of the media and brought a
couple cameras and so I was actually there under
the RT-FM rubric. I had an RT-FM hat and a little
badge and filmed that whole thing. So there’s some extra stickers
if you folks want. Any more questions? Thank you very much.

Stephen Childs


  1. i'm not from the us, what is the status of this project now?

Leave a Reply

Your email address will not be published. Required fields are marked *