Ganglia: 10 years of monitoring clusters and grids
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 97 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/45705 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 201070 / 97
1
2
4
5
6
8
14
15
16
17
23
29
38
41
42
44
46
47
48
50
53
54
62
63
64
65
66
71
74
75
77
78
79
80
82
84
85
94
00:00
Cluster samplingOpen sourceMetric systemView (database)Process (computing)Computing platformPersonal digital assistantComponent-based software engineeringArchitectureIntegrated development environmentPoint cloudExtreme programmingCentralizer and normalizerMassFigurate numberComputerBitMedical imagingMetric systemGradientWeb 2.0SoftwareSystem administratorComputer architecturePhysical systemDefault (computer science)BefehlsprozessorSemiconductor memoryCodeGene clusterComputer fileServer (computing)DatabaseProcess (computing)Business clusterView (database)Hybrid computerSupercomputerMixed realityQuicksortTable (information)WindowReal-time operating systemOpen sourceDemonSoftware testingBuildingStructural loadMereologyStandard deviationProgram slicingMultiplication signTerm (mathematics)Computer animationXMLLecture/Conference
05:14
Personal digital assistantArchitectureComponent-based software engineeringMetric systemPoint cloudIntegrated development environmentExtreme programmingHochleistungsrechnenEmailServer (computing)Type theoryEnterprise architectureComputing platformDatabaseBusiness clusterRead-only memoryPlanningVisualization (computer graphics)BefehlsprozessorProduct (business)TwitterBusiness clusterShift operatorCartesian coordinate systemIntegrated development environmentEvelyn PinchingCodeStructural loadDefault (computer science)ComputerServer (computing)Error messageMultiplication signFigurate numberUtility softwareWeb 2.0Different (Kate Ryan album)NumberParallel computing10 (number)BitGoodness of fitCache (computing)QuicksortInformationComputer networkGraph (mathematics)Cloud computingSupercomputerOperating systemPhysical systemChannel capacityOperator (mathematics)Digital rights managementSemiconductor memorySoftwareGene clusterDatabaseMetric systemBefehlsprozessorWorkloadFlickrHierarchySharewareComputer animationLecture/Conference
10:20
Convex hullBefehlsprozessorView (database)SharewareAngleLevel (video gaming)StatisticsMultiplication signInformationGreen's functionArithmetic meanStructural loadBusiness clusterBefehlsprozessorGene clusterSemiconductor memoryMetric systemComputer networkQuicksortDifferent (Kate Ryan album)Type theoryProcess (computing)NumberGreatest elementLine (geometry)Computer animation
12:05
TouchscreenProcess (computing)ArchitectureMetric systemProgramming paradigmUDP <Protokoll>Integrated development environmentBitComputer architectureDefault (computer science)Object-oriented programmingComputer animationLecture/Conference
13:05
Process (computing)Metric systemProgramming paradigmUDP <Protokoll>ArchitectureServer (computing)Local GroupVertex (graph theory)DemonInformationWeb 2.0Graph (mathematics)Default (computer science)Web pageData storage deviceQuicksortMetric systemGroup actionUnicastingverfahrenFunction (mathematics)Configuration spaceServer (computing)Process (computing)Computer animation
14:55
ArchitectureGraph (mathematics)Default (computer science)Web 2.0Database normalizationInformationMetric systemServer (computing)Installation artGroup actionOpen sourceQuicksortComputer fileIntegrated development environmentWeb browserComputer animation
15:59
Metric systemScripting languageProcess (computing)Modul <Datentyp>CodeScalabilityLevel (video gaming)Server (computing)Uniform convergenceCloud computingAerodynamicsAddress spaceWide area networkConfiguration spaceIntegrated development environmentPoint cloudOpen sourceMultitier architectureCharacteristic polynomialContinuous functionRandom number generationPoint cloudIntegrated development environmentScalabilityMetric systemModule (mathematics)SynchronizationNatural numberWritingQuicksortBusiness clusterMultiplication signBuffer solutionDigital rights managementConfiguration spaceProcess (computing)TrailServer (computing)Computer fileDemonStructural loadWide area networkCache (computing)BootingLevel (video gaming)Term (mathematics)Cloud computingRevision controlExecution unitComputerCodeWikiMaxima and minimaInterface (computing)Reading (process)InformationScripting languageSlide ruleComputer programOpen sourceMereologyIP addressCASE <Informatik>Latent heatDefault (computer science)Gene clusterLastteilungAd servingUnicastingverfahrenSystem callRight angleBitComputer animationLecture/Conference
23:45
Open sourceMultitier architectureCharacteristic polynomialEnterprise architectureSoftwareEuler anglesForceRange (statistics)Electronic mailing listCodeEmailSelf-organizationCore dumpModul <Datentyp>Digital rights managementComputing platformRevision controlSystem programmingIntegrated development environmentMixed realityServer (computing)Source codeDirect numerical simulationAutomationCluster samplingConfiguration spacePhysical systemCodeCore dumpSelf-organizationEmailSoftwareEuler anglesInterface (computing)Range (statistics)MereologyNatural numberRevision controlCategory of beingDifferent (Kate Ryan album)DivisorShared memoryArithmetic meanFreewareQuicksortBitIntegrated development environmentBusiness clusterFluid statics2 (number)Graph (mathematics)Uniform resource locatorElectric generatorSource codeComputer fileReverse engineeringCASE <Informatik>Direct numerical simulationFirewall (computing)SpacetimeTime zoneInternetworkingFlow separationLibrary (computing)Open sourceComplex (psychology)Computing platformProduct (business)Software developerMainframe computerWindowEnterprise architectureVirtual machineModule (mathematics)Metric systemMixed realityCodierung <Programmierung>Meta elementComputer networkPole (complex analysis)Theory of relativityOnline chatElectronic mailing listExterior algebraGoodness of fitDesign by contractServer (computing)Gene clusterModule (mathematics)Uniqueness quantificationInsertion lossLevel (video gaming)Process (computing)Computer animationLecture/Conference
31:31
Direct numerical simulationEnterprise architectureAutomationCluster samplingConfiguration spaceComputer fileSelf-organizationOpen sourceInstallation artWikiWebsiteMetric systemEmailLattice (order)Mixed realityComputing platformIntegrated development environmentTwin primeOperating systemDebuggerEmailSoftware developerOpen sourceComputer fileWeb pageSoftware frameworkComputerWeb 2.0Cartesian coordinate systemWindowProduct (business)MultiplicationGroup actionWikiTwitterMetric systemConfiguration spaceSelf-organizationGene clusterServer (computing)QuicksortInstallation artMechanism designSlide ruleGraph (mathematics)Computer networkElectronic mailing listComputing platformBeat (acoustics)Metropolitan area networkCache (computing)Turbo-CodeRevision controlDistribution (mathematics)Multiplication signDefault (computer science)CodeComputer animationLecture/Conference
39:18
XML
Transcript: English(auto-generated)
00:13
OK, so I think I'm going to get started. So hello, everyone. My name is Bernard Lee.
00:20
I'm one of the project administrators of the Ganglia project. So today I'm going to talk to you about Ganglia, which is a monitoring software. And the project has been around for 10 years. So we're going to tell you about what the software does and why you would want to use it to monitor your computers, specifically clusters, grids, or just
00:44
web farms. So actually, before we begin, I just wanted to have a quick poll. So who has heard of Ganglia? So who has actually used it? So it seems like a fair amount.
01:02
Maybe you guys know quite a bit about it already. OK, well, actually, let me just briefly introduce myself. So I'm Bernard. So I've been working on high-performance computing related open source software. So I've worked on provisioning tools like System Imager,
01:21
Oscar, and monitoring side, working on Ganglia. So for the past few years, that's what I've been working, just getting involved with a lot of the open source software. So Ganglia. So what is Ganglia? So the goal is, basically, the software
01:43
is to gather system resource metrics in real time so that you can figure out what your host is doing. So you have even one host to hundreds to thousands of computers, you set it up, and they're running.
02:01
But you want to find out what a system resource is like and what it's doing. So when you have set a system, you want to have a software that gives you a centralized view of what's going on. So that's what Ganglia's goal is.
02:20
So the project started around 1999 by Matt Massey. So it started at University of California at Berkeley. So it was part of the Millennium Project. So it's a project that was involved with building clusters. And they want to find out what the cluster's doing,
02:42
how its system load is, and things like that. So Matt wrote this software. And for the past 10 years, basically, you think of monitoring for cluster. You think of Ganglia. It's somewhat become the de facto standard for monitoring system resources.
03:02
So it's a very lightweight process. So when you monitor these systems, you don't want your monitoring daemon to actually take up a lot of resources, because that would be very wasteful. You actually want to do real work on your computer. So if you're monitoring software in the way of that,
03:20
that sort of defeats the purpose. So it's very lightweight in terms of CPU and memory usage. It doesn't use that much resources. And basically, you have a monitoring daemon called gmond. And you run it on every node. And basically, all the metrics that's collected on each host is aggregated on a separate server, which
03:41
runs the gmond daemon. And these metrics are aggregated into round-robin database files. So ROD files are basically time slice data. So it's good for storing these metric data so that you can go back in time and look at what your system has been doing in the past.
04:05
Again, it's a very lightweight agent. It supports most Unix, Linux systems, and even Windows via Cygwin. So basically, you can run it on anything.
04:21
And it doesn't matter whether you're running the ganglia on these different OSes. They will all work with each other. So you can have a mixed sort of hybrid system as in many large corporations, you would run different OSes. And then you can use one tool to basically monitor everything.
04:44
Keep pressing the wrong button. OK, so it's BSD license, open source license. So what I'm going to talk about. So basically, what you can do with this software.
05:02
What does it look like when you actually use it as a user? And a bit about the architecture and some advanced topics. So by default, ganglia would collect 30 or so metrics about your host, like CPU load, memory, network, and all that stuff.
05:21
So that's the default one that's collected by default. But if you want to collect your own metrics, like how your Apache server is doing, your MAM cache D, just basically anything you can somehow collect from your operating system. You can plug this information into ganglia. So I'll go into it a little bit in detail.
05:46
Ganglia is very scalable, but so we're going to talk about some issues when you run into thousands of hosts or tens of thousands of hosts. And cloud computing is quite a hot topic nowadays.
06:02
So just to give you some brief notes about what the environment is like if you want to use ganglia to monitor it. And then we'll have Daniel Pocock just come up and give some user testimonial. And then basically I'll end with how you can get started
06:23
and get involved with the project. So typical users. The project came about from high performance computing. So these are clusters of computers, basically have one goal, and it's just to crunch a lot of numbers, run a lot of parallel code.
06:42
And ganglia came about and it makes it very easy to figure out what your cluster is doing. So it has this hierarchy of a grid and a cluster so that you can find out if your cluster's doing,
07:00
how your cluster's doing on one end and then on the other clusters, you can sort of aggregate all the different data. And launch enterprises, you have different servers like web servers, database servers. You have a large corporation there. You have many computers that do different things. So you can use ganglia to check
07:23
how these servers are performing. And then you can go back in time, look at the history and figure out what's going on. So it's pretty similar uses. And in your IT environment, you have support issues
07:44
and why is your system not performing as you think it should? So you can also use it to look at memory utilization and when your servers reach a certain load and maybe it's time to buy new computers or actually upgrade your memory or whatever.
08:01
So you can use ganglia to look at all these pretty graphs and it gives you an idea of how your systems are performing. And then you can see, okay, if a whole bunch of servers have really high load, maybe you can shift the load around and maybe even virtualize it.
08:24
And you can use it to troubleshoot applications. So you have different users running different code on your computer and maybe you're trying to figure out why is causing this. First of all, you need to know that your system's having high IO load, but how would you tell?
08:42
If you have a thousands computer, you're not gonna log into each one to do a top and figure it out. So with something like ganglia, it has these graphs with aggregated information. So basically, you can see very quickly that what your systems are doing. And with that information, it helps you troubleshoot
09:01
application problems and things like that. And when you're writing new software, sometimes you don't know how it performs and you use it, you don't know how much resources use it. So again, ganglia is useful for these kind of workloads.
09:26
So just give you an example of some people who uses ganglia. So these are just names you can find out from our website. There's a little bar on the side that tells you who uses ganglia. So I'd like to point out especially about Flickr.
09:43
So I know the previous operations manager who was always saying, use ganglia. What does he use it for? It's for capacity planning. It's like, you have these graphs that tells you, okay, well, we sort of hit the resource wall. It's maybe it's time to buy new computers. So you would go up, go to your supervisor,
10:03
your manager and say, okay, well, I mean, this is the real load and we need more computers to handle these loads. So it's good for that. So let me give you a quick demo
10:22
of what ganglia looks like. So this is the Berkeley grid. So it's divided into different, so you see here, you have a main grid. So this is sort of like the top level. It aggregates all the metrics you see at the bottom here.
10:44
So this is one cluster which you can click into. So this red line just tells you, okay, this is the max, the number of CPUs in this cluster and so the number of running processes here,
11:03
this is the gray stuff is like the load. So it's a whole bunch of different type of charts that you can see. So it's like memory, network. So down here, these are individual hosts. So red here means it's sort of a high load
11:22
and green means it's not that busy. So again, you can click into it here and see what each individual host is doing. So this is one host. It says, you know, it's been up since this time and it gives you a lot of information.
11:41
But so basically, all these metrics is collected on the host level and it aggregated up to the top. So a collection of hosts is a cluster and a collection of clusters is a grid. So you can actually even have like a grid of grids so that you can sort of aggregate it all the way up.
12:01
So these are stats of individual hosts. So it's pretty self-explanatory. Okay.
12:22
Okay, so let's talk a bit about the architecture. So every node runs the gmond agent. So that's like what you run on individual hosts.
12:40
And so it doesn't keep any like historic data locally. So it's just, you know, the data just sends around. So if the data is transmitted, so the metric data is transmitted by default, it uses multicast. So in environments where like multicast could be considered like chatty,
13:01
like you don't want to send too many packets, what you could do is you can use, oops. So you can use unicast UDP packets so that, you know, reduces the amount of like network traffic. And then basically you have this gmetad server that aggregates all the data and stores in our default.
13:23
So I think I mentioned that previously already. And then all this information is then presented on the web server, which basically, you know, serves the webpage you saw. And, you know, you install that web server like Apache or like Lightning or whatever,
13:40
and you basically it runs on the same server as your gmetad process. And it's used to like create the graphs and the charts that you see. So let's just run through what it looks like. So by default, it uses multicast because it's very easy to set up.
14:01
It's basically setting up is just, you know, just start the daemon. The configuration by default would use multicast. So every node would transmit its own metrics to the multicast group. So you don't need to do anything special. And every node would receive metric from each other. So essentially you talk to one node,
14:23
it would know the metrics information of the entire everything. Every node from that multicast group. And, you know, it has, yeah, again, it just, every node would already know like what the metric information of the other guys.
14:41
And the node can actually be pulled by a specific port and then it will give you like a XML sort of output of what like the metric looks like. And that's what we use sort of to send the information around. So then we have the GMATiD server
15:00
and the web server that aggregates all the data. So the GMATiD server pulls in the multicast environment, it pulls any one of them. So if for those of you who have used it, in the example, it sort of, it leads to that you have to add each host. Like, so there's a data source that points to like a particular GMATiD host.
15:21
So the configuration, so it leads that you have to put every host in the multicast group, but it's not necessary because actually it's mainly for redundancy. So in case like if your multicast group, one host goes down, you can go to the other guys. But basically you just pull one host in the group and then you get, you know, you'll get all the information of all the hosts.
15:45
And so our defaults are created and you know, to store the metrics. And then from the web browser, you can see the graphs and the charts and to see what your installation's doing.
16:01
So I'm just gonna talk about some advanced topics. So I mentioned previously there, by default, it's, Ganglia collects all these standard metrics, but what if you will have your own like metric that you want to collect that's not part of the standard metric.
16:21
So we have this command line tool called Gmetric that you can basically feed it metrics. And typically you run it, you either write a script that gets all this data, like you write a program to get like the temperature reading of your host, like usually there's some command that you can run
16:41
and you would feed it to Gmetric and then your cron job would run it like every couple of minutes just to feed the information. So in newer versions of Ganglia, there's a, we wrote a module interface to gmond so you don't need to use Gmetric
17:02
anymore, so basically you can write C or Python code that, and then the modules has callback and basically you set a value for like how often gmond would get the metric and this is just snippets of code that you write to basically collect the data.
17:22
So I'll show, I think the next slide shows you how it works. So in this case, you don't need to worry about like having a cron job. So the gmond process would be in charge of like periodically getting all this information.
17:42
So this is just a pretty stripped down example of what this module interface looks like. So the first definition is basic, well this, basically what this does is, you know, it generates a random number and it feeds it into gmond. So the first one basically does all your work.
18:00
So back to the temperature example, you would write some code to get the temperature reading and then you would have a, you would init your metric and just feed it in. So the time max here is just like how long it takes before you would feed it the data
18:22
and the unit here is like, you know, it's just an integer. It's actually pretty straightforward. So there's, in the Wiki page, we have some document on how to write these modules. So like, so ganglia is designed to monitor
18:43
a lot of computers. So we noticed that, you know, when you have a thousand computers, you start to have this scalability issue. So what the problem is is that by default, you have like 30 metrics and if you have a thousand computers,
19:01
that's like 30,000 metrics. So all these, each metrics when it's collected, you have to write it out to ROD files. So that means there's a lot of like IO happening. So previously what we did was to put the ROD files on tempfs. So it's basically just RAM. So it's really speedy. So that sort of alleviate the problem.
19:22
So, you know, it could still continue functioning. But the problem is if you put your ROD files in tempfs, then, you know, once the GMADD server reboots and it's all gone. So you basically, you need to sync it to this so that you keep this historic data.
19:41
So in the new version of ROD 2, there's a new daemon called ROD cache D. So basically what it does is it hangs on to, so on, it hangs on to write processes of these ROD files so that it will like, it will hang on to a couple updates until it's a specific time has passed
20:03
or that, you know, there's enough like updates and they write it out at once. So in that case, it sort of buffers the write. So it reduces the IO levels by quite a bit. So if you have like, you know, if your GMADD is monitoring like over, you know, a thousand hosts, then this is something that you can consider.
20:22
So it's better than the tempfs approach because you don't need to, you know, sort of sync the files and it's just a better well-rounded solution. Okay, so I'm not gonna try to like
20:44
give my own definition of cloud computing because there's already like so many, but I guess I'll just talk briefly about like how we as, you know, the gangler project wants to address this. So basically for us like gangler,
21:01
the cloud environments are dynamic. So gangler was designed to monitor clusters and grids, which are pretty static. So you provision at once, basically you don't, you know, you don't think of like, you know, it going away. It's just, you just keep adding more hosts.
21:21
So yeah, so basically we need to figure out how to handle this dynamic nature. So in terms of like networking, there's no multicast support. So by default, gangler uses multicast. Obviously you can use unicast, which I mentioned, but you know, if you can't use multicast, it changes like how you set it up.
21:42
And all these guys basically have like all these cloud computers, basically they have WAN IP addresses. So when you configure it, you need to like for load balancing purposes, you need to have some way of bootstrapping the configuration. So you would, maybe when your cloud host boots up,
22:01
maybe it'll talk to a centralized server to figure out, okay, well, which host I should like send my metric data to. So these are some things that we need to think about. And in going back to the dynamic nature, so you have this host dmax, which basically tells gangler how long to hang on to a host.
22:22
So in typical cluster environment, you actually do want to know when the host goes down, but then like, because in cloud environments, you ramp up and round down like pretty quickly. So do you really want to like, keep track of that that way?
22:40
So basically if the host dmax, you adjust it and it just sort of ignores that the host is gone. So, okay, so I'm gonna now, hand off over to Daniel, who's gonna, you know, give some user testimonial.
23:24
Okay, my name's Daniel Pocock. I'm working at a large bank in London. I've been deploying gangler. I've also been involved with the gangler open source project for virtually the whole time I've been working on this project for my employer.
23:43
And so I'm currently working as the release manager in the project as well. I've been doing that for about 18 months. Using open source methods is a key part of the job. It's something that was discussed right at the beginning at the interview stage.
24:02
And I indicated that this is a way that I work. And they were quite keen to pursue that with me. So we're just gonna talk a little bit about both what we've done with gangler, the challenges we've faced, and also the aspects of working on an open source project
24:21
in a corporate environment. So every big company has a different attitude to open source software. And you've probably seen this, that some companies talk openly about their involvement with open source and with Linux.
24:41
And other companies are very sort of wedded to Microsoft in a big way. So there's obviously distinctions between the meaning of free software and open source software. In a business, and particularly in a bank,
25:01
financial concerns are important. So we'll just talk about free as in no price tag. In the good old days, people didn't have to worry too much about the cost of software. And they'd often buy things based on the support contracts, the size of the vendor, and various other factors.
25:24
These days, people are looking at a wider range of options. I don't think I need to go into the reasons behind that. But open source software is being looked at a lot more seriously. And where open source software provides a credible alternative,
25:42
people have to look at it. On the other hand, using public email lists, IRC chat, sharing code on the public internet, these create issues for many organizations. They create issues of how the company's been portrayed
26:05
like on the internet, the sharing of intellectual property. These are all challenges for different people in the company. Some of them who are not from a software development background, to put it mildly.
26:21
Ganglia has provided a compelling reason to have those debates in the organization where I'm working. And it has a relatively unique status. And we'll look at some of the reasons for that. It's not highly controversial because it's a monitoring tool.
26:42
So it's not the core business of the company. The core business is banking and not system monitoring. So it's not a big loss if we're collaborating on a system monitoring tool. So we can do that with the Ganglia project
27:02
quite effectively because of the modular nature of the project. As Bernard mentioned before, with version 3.1 of Ganglia, you can develop your own metrics as modules in C or Python. And you can feed metrics in with G-metric.
27:21
So if we have a need to develop a metric that uses proprietary code, we can do that. And that code can be separated using the module interface. So if we want to share parts of the common agent code, as long as that module interface is stable,
27:44
then we can separate those things very easily. Just looking at the large enterprise environment, you've got a mix of different platforms.
28:00
You've got platforms from different generations. So you have some machines running recent versions of Red Hat. You'll have other machines running, say Windows NT4, for example, which is quite an old system. So if you look around in an organization that's large enough, you will find a little bit of everything. I mean, you'll find mainframes if you look around.
28:26
The users have a whole range of different concerns. They're particularly concerned about something that might make their system less stable, that might steal resources from their application,
28:41
or that might add complexity to managing their hosts. Fortunately, the ganglier agent is lightweight. It runs on many of the platforms in a big environment. The source code can be tweaked, if necessary,
29:02
because it's open source. So if you have a particular need, we can recompile it for a particular platform. If we don't want to use a particular library or something, some of the libraries can be taken, some of the libraries can be disabled. So the PCRE support, which has been added recently,
29:23
is a purely optional feature. So we can disable that. Some of the challenges that we face using the ganglier product in particular, it's heavily reliant on DNS. Once again, big organizations have
29:41
a range of DNS problems. They're not connected directly to public internet DNS servers. If you've had a lot of mergers and other corporate activity, then you may have several different DNS zones within the organization, and they might be separated over different firewalls. There may be overlapping IP space
30:03
and a whole range of things. Now, ganglier relies on reverse DNS lookups, and it relies on the host names to generate file names for the graphs and to generate the URLs for looking at those graphs. So when you have a lot of DNS-related issues
30:22
in your network, then those will be reflected in how you manage ganglier. It's not clear how ganglier is intended to perform with short polling intervals. While looking through the gmeta-d code recently,
30:40
I found some cases where poll intervals have been randomized by five seconds either way. But if your polling interval is, say, five seconds, and you randomized by five seconds, then you could reduce the interval to zero, or you could increase it to 10. So I found that wasn't very effective,
31:01
so we decided to tweak some parts of the code to handle that, but there may be more attention needed to deal with that. You've seen the example before with the hosts grouped into clusters and grids, but when you install the ganglier package on the host,
31:23
how does it know which cluster to join? The current version of the agent is configured using a static text file, so you can include a text file in the package. You can also use a tool like Puppet. If you have a Unix platform,
31:41
and you have Puppet across your whole network, you can use that to join different hosts to different clusters. But in an organization that has Windows, and that has some hosts that are quite old, deploying Puppet would significantly magnify the effort of installing ganglier,
32:01
because then you've got to install two products and not just one. So that's another challenge that we're looking at in the ganglier project. To run ganglier on Windows, you currently need Cygwin. The good news is using Cygwin, it does work, and it's quite effective.
32:22
The bad news is that there are issues with having multiple Cygwin applications on the same host. So once again, if you've got a lot of Windows hosts, and if some of them have been around for a long time, and some of them are quite new, and they're all running different applications,
32:41
you may not know if some of them already have Cygwin. And so when you put the ganglier agent on there, you could break something else. So once again, the Cygwin DLL is a challenge that we need to deal with. With the project that I've been working on and participating with the open source community,
33:00
we've been able to discuss many of these issues, and to find ways to manage them. Some of that work has been contributed back to the open source project. So I'll just bring Bernard back now to wrap things up, and then we can go into some questions
33:21
or some further demonstrations. Okay, so thanks Daniel. So yeah, so how can you guys get started
33:41
if you wanna try it out? So I guess the easiest way to just use the prepackaged stuff, so the Debian packages, like Red Hat, Fedora, Susie, I mean they're all been around for some time. And even on Solaris, I think recently, Daniel did a lot of work. Actually, it works now, right?
34:01
The OpenCSW, so you can get it for Solaris even. So yeah, as I mentioned before, even though you have different distributions, you just install the packages, they should just work. The only issue is sometimes with the free one version,
34:21
you can't really mix it with free old, but that's just some subtle issues. So yeah, again, just install gmedi on all the computers you want to monitor for metrics, and dedicate one server for gmedi and the web server, and basically, you're done. So in theory, you don't really need even a configuration file for gmedi,
34:41
but you can use it if you want. So if you want the bleeding edge stuff, or somehow there are no prepackaged packages, then you can download the source turbo, or even from our repository, and then just build it.
35:01
So website ganglia.info, you can see what we've been doing, and there's a wiki and a SourceForge web page. So we provided this framework for you to monitor systems, and also these custom ways of feeding these metrics.
35:22
So there actually is a community around it to sort of write their own metrics, so you don't have to reinvent the wheel. So there's people writing, as I mentioned before, like monitor Apache or MAM cache D, there's a whole bunch of these custom metrics that has been created already,
35:42
so you can check it out before you write it. So finally, there are a couple mailing lists, Ganglia General, these are all hosted on SourceForge. Ganglia Developers is the developer mailing list, and we are on RC, free node. There's a Twitter aggregation feed.
36:02
And so actually one group of people I'm particularly interested in inviting to sort of join the project is, I guess you have seen our front end. I mean, it's been like that since probably the past five to 10 years. I mean, it's pretty functional,
36:21
but you know what, as the project goes on, what will be nice is a way to customize what you see on the front end. So there's already work done to make the front end more modularized so that you can customize it, because depending on your company's organization,
36:40
some group of people will want to see your Ganglia graphs in one way, and then another group may want to see it different ways. So it would be nice to sort of provide some mechanism so that it's very easy to customize without even writing any code or anything. So if there are any Ajax or JavaScript gurus who are interested in working on a front end project,
37:02
let us know. So I think with this, I'd like to thank Daniel for helping me prepare the slides. I mean, he did all the stuff, and thank you for us, Dan, for inviting us to give this talk, so thank you.
37:22
So I think we have, if you want to have any questions. Maybe I should be looking at the mailing list,
37:42
but we're using Ganglia with IPv4. Are there any issues moving to IPv6? Sorry, say that again. We're using Ganglia with IPv4. Okay. Are there any issues in moving to IPv6? You said RH, okay, sorry.
38:03
Are there any issues on moving to IPv6? Have you seen installations using IPv6? IPv6? Yeah. Actually, that's a very good question. I don't, I'm not aware of any, I would assume that as long as your network,
38:23
you know, your operating system supports it, I'm not sure if we need to make any modifications to the code, because it's just, actually, does anybody? You guys know? Have you guys, yeah. So yeah, maybe take this, we can take this offline,
38:40
because I haven't, I'm not aware of anybody sort of needing this, you know, use, yeah. I know IPv6 does at least somewhat work, because it broke BSD fairly badly when the port went in,
39:02
so I know it's in there. So, like, you're saying that it works with Ganglia just by default? Yeah, because I haven't tested it. I actually haven't seen much traffic about it, so, but, I mean, definitely try it out, and if.