We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Linux kernel debugging for sysadmins

00:00

Formal Metadata

Title
Linux kernel debugging for sysadmins
Title of Series
Number of Parts
95
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
A deeper understanding of linux kernel would help sysadmins to debug issues, tune systems and provide better root cause analysis. This talk would provide insights on investigative methods and tools, that can be used by Linux Admins to understand a bit more about the systems they manage.
Keywords
22
Thumbnail
54:22
27
29
36
Thumbnail
1:05:58
38
Thumbnail
1:00:58
65
Thumbnail
44:43
75
91
Thumbnail
1:21:58
94
Open sourceFreewareKernel (computing)Hacker (term)Process (computing)Physical systemKernel (computing)Parameter (computer programming)Point (geometry)View (database)LoginComputer programmingMultiplication signSoftwareFeedbackEvent horizonMoment (mathematics)IntegerSystem administratorExistenceXMLComputer animationLecture/Conference
System programmingEinheitswurzelMathematical analysisOnline helpPhysical systemService (economics)TouchscreenVideo game consoleConnectivity (graph theory)Computer networkBlogPhysical systemSystem callTouchscreenMechanism designMultiplication signMusical ensembleKey (cryptography)Presentation of a groupKernel (computing)Service-oriented architectureOrder (biology)Game controllerInformationFrequencyArithmetic meanFocus (optics)OntologyMathematical analysisDirected graphRevision controlMultilaterationCellular automatonInsertion lossShape (magazine)CASE <Informatik>Student's t-testMereologyBuildingSemantics (computer science)File systemSoftwareEinheitswurzelService (economics)Video game consoleBootingConnected spaceCausalityMobile appTelnetConfiguration spaceData centerLoginSynchronizationComputer animation
GEDCOMKernel (computing)Keyboard shortcutMultiplication signIntegrated development environmentProcess (computing)Standard deviationParameter (computer programming)Type theoryBitKernel (computing)Computer animation
Kernel (computing)InformationMessage passingGame theoryNetwork socketDatabase normalizationInformationSemiconductor memoryCore dumpProcess (computing)MereologyCountingState of matterFunktionalanalysisDot productCoprocessorGreatest element2 (number)Forcing (mathematics)Right angleInstance (computer science)Kernel (computing)Network topologyWindowCodeText editorMultilaterationSoftware testingThread (computing)Computer animationSource code
WindowGEDCOMKernel (computing)CodeFunktionalanalysisElectronic mailing listParameter (computer programming)Computer animationSource code
StatisticsGEDCOMMessage passingMessage passingPhysical systemBootingWechselseitige InformationMetric systemKernel (computing)Level (video gaming)Medical imagingInformationQuicksortComputer animation
Message passingKernel (computing)StatisticsCore dumpRippingTask (computing)Process (computing)System callServer (computing)Software bugBefehlsprozessorModul <Datentyp>Gamma functionCivil engineeringRaw image formatInformationComputer-generated imageryFreewareWeb pageMemory managementMaxima and minimaNormal (geometry)Cache (computing)Data bufferRead-only memorySpacetimeStructural loadInformationPhysical systemSemiconductor memoryQuicksortPhysical lawLatent heatObject (grammar)MereologyPoint (geometry)BitMultiplication signOnline helpDifferent (Kate Ryan album)Block (periodic table)Image resolutionWeb pageFlock (web browser)Process (computing)2 (number)Pattern languageSystem callCASE <Informatik>Module (mathematics)State of matterFunktionalanalysisResultantOrder (biology)FreewareNonlinear systemMessage passingInterface (computing)Limit (category theory)Standard deviationCartesian coordinate systemLocal ringCondition numberCore dumpVideo gameComputer fileExpert systemNumberPC CardSource codeTask (computing)Musical ensembleLink (knot theory)BefehlsprozessorWell-formed formulaParticle systemArithmetic meanMoment (mathematics)Principal ideal domainRippingMathematicsForcing (mathematics)Right angleBasis <Mathematik>CoalitionInsertion lossDigital photographyNetwork topologyResource allocationKeyboard shortcutPointer (computer programming)Kernel (computing)PlanningSoftware bugData structureError messageLogicUsabilityFlagWindows RegistryCodeSystem administratorTheory of relativityLine (geometry)Revision controlFrequencyElectronic mailing listGoogolFunction (mathematics)Memory managementIP addressLevel (video gaming)TableComputer animation
FreewareWeb pageKernel (computing)Memory managementMaxima and minimaNormal (geometry)Cache (computing)Data bufferProcess (computing)Read-only memoryVirtual machineComputer hardwareException handlingBefehlsprozessorError messageFehlererkennungEvent horizonHypercubeCore dumpParameter (computer programming)MiniDiscPhysical systemCartesian coordinate systemWeb pageResource allocationKernel (computing)Semiconductor memoryCore dumpInformationLevel (video gaming)Computer fileElectronic mailing listNormal (geometry)FunktionalanalysisBit32-bitFehlererkennungRight angleAnalytic continuationMultiplication signLogicDocument management systemSet (mathematics)Point (geometry)RoutingOnline helpOcean currentParameter (computer programming)Principal ideal domainProcess (computing)Term (mathematics)Utility softwareOrder of magnitudeCrash (computing)WeightTunisComputer hardwareVirtual machineException handlingSoftwareMiniDiscSeitentabelleEvent horizonSoftware bugBefehlsprozessorTask (computing)Different (Kate Ryan album)CASE <Informatik>Digital watermarkingDemonMemory managementState of matterArithmetic meanOrder (biology)BackupCode division multiple accessUniform resource locatorPhase transitionVideo gameFreewareSource codePhysical lawMereologyBoss CorporationLink (knot theory)Moment (mathematics)Condition numberField (computer science)QuicksortoutputForm (programming)NumberDemo (music)Bound stateMusical ensembleWater vaporTouchscreenAngleLocal ringComputer animation
Core dumpRead-only memoryKernel (computing)Parameter (computer programming)MiniDiscMessage passingGEDCOMEmailPhysical systemCore dumpThread (computing)InformationCrash (computing)Default (computer science)Computer animation
Message passingKernel (computing)AreaRevision controlVirtual machineWorld Wide Web ConsortiumLocal ringTime domainSpacetimeIEC-BusCore dumpProcess (computing)Principal ideal domainMultiplication signInformationState of matterPhysical systemDifferent (Kate Ryan album)Crash (computing)Wrapper (data mining)Symbol tableKernel (computing)Computer fileTask (computing)Canonical ensembleComputer animationSource code
RippingWorld Wide Web ConsortiumElectronic meeting systemMoving averageGEDCOMSpacetimeTexture mappingFlagInformation securityFile systemComputer fileInstance (computer science)Data storage deviceParameter (computer programming)Right angleInformationPrincipal ideal domainData structureSource codeCodeSemiconductor memoryKernel (computing)Physical systemProcess (computing)Multiplication signDirectory serviceSystem callContext awarenessState of matterDifferent (Kate Ryan album)Interface (computing)Office suiteForcing (mathematics)Network topologyStapeldateiUser interfaceInverse elementWireless LANSource code
Hash functionSpacetimeGEDCOMProcess (computing)Task (computing)Type theoryThread (computing)InformationComputer fileParticle systemPoint (geometry)Computer animation
SpacetimeGEDCOMMetreFlagLimit (category theory)Task (computing)Logical constantInformationMultiplication signTerm (mathematics)TimestampOnline helpQueue (abstract data type)Computer animation
Limit (category theory)FlagOpen sourceFreewareFunction (mathematics)AreaoutputPiPrincipal ideal domainArray data structureEmailArmGamma functionProcess (computing)Online helpWater vaporModel theoryState of matterInformationBefehlsprozessorOcean currentQueue (abstract data type)Lecture/ConferenceComputer animation
StrutPrincipal ideal domainWorld Wide Web ConsortiumBridging (networking)Data structureGamma functionMassSigma-algebraFunction (mathematics)Maximum likelihoodGEDCOMoutputKey (cryptography)Resource allocationPoint (geometry)InformationSummierbarkeitSemiconductor memoryMiniDiscSource codeComputer animation
Physical systemSystem callPulse (signal processing)Interface (computing)Level (video gaming)Multiplication signNumberDuality (mathematics)State of matterLink (knot theory)VideoconferencingTerm (mathematics)Order (biology)MereologyRadical (chemistry)Normal (geometry)Point (geometry)Bit ratePhysical lawCognitionDimensional analysisCoprocessorCASE <Informatik>Kernel (computing)Structural loadAverageArithmetic meanProcess (computing)CalculationLoginInterrupt <Informatik>Mechanism designGeometryLecture/Conference
Open sourceFreewareEvent horizonComputer animation
Transcript: English(auto-generated)
Yes, so once again Welcome here at the next lecture at the first come 12 our next lecture. I'm I'm pretty sure it's going to be great because it's into Joseph talking about Linux currently bugging for resist arguments
Which is quite an interesting topic and? from my point of view So just one more thing before we start if you liked the talk or even if you dislike Which I don't believe but please provide us some feedback log on to the to the frat to the program
software On for us going to e and give us some feedback because it really helps us to to organize this event so Yeah enough with the talk. Please give some warm hands for mentor Joseph. Hello
so How many assessments here? Good how many kernel hacker I saw one Okay, okay, you are going to answer the QA
So so this talk is based on my experiments and Experience based on My system in job like Of like somebody who has been managing systems for quite a long time
So this is I hope this will help See sermons who are currently in the verge of trying to do something in kernel to do more so I I Currently work with a Canadian company called pithian
So I will start my talk so This is the agenda of this what I'm trying to cover like some basic investigation methodologies and some common issues and some tools pretty simple
And why why should see sermons do? Canal debugging of course to learn more about the systems we manage Debug efficiently when you face an issue and Root cause analysis this is becoming more and more
Important nowadays with more focus on SRE DevOps kind of workflows integrating post-mortem analysis and everything in the workflow, so so this is the Thing which I'm going to cover You all must have seen some user complaining or maybe
in the shape of Nagi of salad seeing that yeah, my system is not responding. I mean and the whole talk is based on this So I have the investigation I have split into two
before reboot and after reboot why reboot because many time You would end up rebooting your system if you face a kernel issue Not all the times many times But let's talk about that So
I'm just going to talk about stuff. Which we generally do so you basically identify whether the claim is actually Right whether it is a system issue or a service issue like yes a user might be complaining My app is not working the system is not working that doesn't mean that the system is strong you do the basic
Talking you do the basic magic with your telnet ping Toolset and see whether whether issue is system or a service issue Then You will check out
Whether What do you see in the screen like if you have like Any kind of user provided console you will check You if you have like a VM you will check the VM console KVM
Anybody heard of KVM like somebody with a gray beard probably know it's this Switch which we switch between different monitors and data centers you check the screen and see whether What is the issue if possible? network connectivity definitely you will try to
See the kind of how is the connectivity whether the network connectivity issue is within the system using ETS tool or if config and logs Or you can check if there is any connectivity issue to the system And If we check all of this stuff, and we identify that it is a system issue
How do we what do we do next I mean it there could be cases Where the system is totally stuck You are able to connect to the system, but It's not responding at all like if you can you can see something in the screen or maybe nothing in the screen
But it is not responding to anything Maybe or may not be you are able to connect to net network. What would you do? You can try sis RQ So Why sis RQ sis RQ?
You say kernel technology and mechanism which will allow you to send keys Generally it is called magic keys, and it will allow the system to dump Useful information it will allow the kernel to do useful stuff
like sync your file systems or even Panic your camera. I mean what I mean why that is useful. I will come to that later But let's have a quick look at sis RQ So sis RQ how about now?
Sorry a bit more yeah, okay Okay, so this is the sis RQ kernel
CCTL parameter you can enable it once you enable it you can If you have access to a keyboard you can type all sis RQ and the letter key
So let me type the another way to send a sis RQ request would be echoing the parameter echoing the value for example echo M To draw sis RQ Sis RQ trigger it will dump the memory information to your syslog
Let's have a look so you can see the memory information is Dumped here. What can we do with this? I will talk about this later. I mean how to make sense of this
Don't be alarmed with the bar of test. I'm so we can also dump The thread state information of all the processes Let's give it a couple of seconds, okay, you can see
The process trace information You can see the process name you can see the state of process whether the process sleeping or in Uninterruptible sleep which is D state or which is in runnable state which is R state then you can see the code path in which
The process the code path which process was executing What to do with the code path I generally if I want to try Understanding what is going on with this? I'll just Look at the kernel functions. These are the kernel functions. I will Maybe go into the kernel tree and
So I have already run C tags here a lot of people Oh, yeah, it's a different window How about is it good? So I have already done C tags here. It will basically create C tag information
so that a text editor like we win can Have a look for example. I'm just doing a Basic VFS, right? I'm just looking for it. So I can just go right into the code. I Can look for like if I want to see another instances of
The code I can just easily look a lot of people do C-scope, I don't use it because I'm not a kernel developer, so I don't have to search too much code Okay, so That is this RQ
We will talk about I mean how this can be more useful would be Finding the arguments which are passed By one function to another that will be interesting or finding the arguments for a function We'll come to that later Let's go And continue so after the reboot
We all check the syslog of course your one long messages you your one log kernel Then you check the sysstat I hope everybody helps us start installing your system Even if you have your fancy metric system definitely try to install sysstat include that in your
In your AMI's your images, I'm pretty sure most of the people knows, but I'll just show so just that includes our command which We can have a look at the system log information
So Sorry in our system load information here, and you can see whether the load was in user space System space I await all this stuff This these are stuff which we generally do
You can check memory stuff like this If anybody have more questions you can ask after the talk any of this or like When we have QA I'm pretty sure that many of you have Interfaced with some OS vendors and have provided a VM core one time or another
You are a sysadmin so VM core is a memory dump of Your Linux system we'll talk about VM core later Now let's just see few stuff like panics and
Kernel related issues, which you might see One you must have seen in your work life, so this is a normal panic. It's a very old panic I chose it for a reason because It's pretty straightforward and easy to explain so this is a
Kernel bug get and you can see the The file name in the source code and line number so this panics generally happen when There is a Condition called bug on in kernel source, and if your code ended up being there this can this panic generally hits
And You can see the so I just I'm just showing this just to Get you a bit more familiarized with the structure of Panic so the next time when you have a look at it you make more sense out of it, so this was a CPU
which was Executing at that time and the modules loaded In the system during the time of issue you can see if a particular Module is having any kind of flags like Proprietary or force loaded you can see this
You can see the PID Of the process which was running during the time but a process cannot panic a kernel period a process behavior can trigger a panic But a process from user space cannot panic a kernel if that happens there is something problem
The problem is in the camera Then Yeah, you can see the kernel version you can see whether or not the kernel was tainted with any proprietary module or stuff and This is the most important thing if you are starting with this if you just want to google it out
Don't google these you just google this this is the instruction pointer This was the function which was being executed when this panic happened so In 64 bits you will find e r IPs in 32-bit you will see e IPs
I will skip the whole registry part here, and I will Go to the call trace again So this is the this the call trace like the call trace which you're seeing in the SysRQ Third state output you can see that there's a system called Function and it is either most of the or probably all of the system called letter
Kernel calls are like this is underscore times or sys under core the system called name so in this case Two processes were Trying to do something and there was a race condition which caused the issue
We'll not get into how to debug a kernel planning That's a very big topic, and I'm not hundred percentage called qualified to do that, but I can show you pointers on how to do that and I Have some specific knowledge about some specific part of subsystems, but definitely if you have any questions
I can try to point you to people or try to point you to resources So this is a soft lockup Again, why would a panel cannot panic happen a camera panic happens when kernel? things that at this point of time I
Cannot properly Recover the system kernel might think that yeah at this point of time. I cannot This if I am continuing this might cause a data loss in that case That is when Can across the planning that is the standard
Definition that could be a lot of different Logic below that, but this is the baseline So next is a lockup the soft lockup soft lockup doesn't need to always cause the system unusable Soft lockup usually happens when kernel tries to evict a process from the CPU
But it is not able to so it's continuously Running for 10 seconds so here you can see the instruction pointer is EIP because it's the 32-bit system You can see this the pattern is saying the the you can see the CPU you can see
The registry information you can see the call trace here one more thing you can see you can see that Some of this code is coming from module one specific module, so Or this specific code is coming from this module, so this is also kind of useful information
And I'll show you an example of hung task hung task happens when a Process is in this state for more than 120 seconds. What is this state?
this state is Uninterruptible sleep that way basically happens when a process is waiting on I you some most of the cases there can be other cases as well mostly are you?
So here the process is waiting on This state for more than 120 seconds you can this can this behavior in in some cases can be an expected behavior your process might be Supposed to be waiting like running being on D state for a long time There can be corner cases in that case you can just disable the hung task
hung task necessarily doesn't Make the system unusable, but many cases it can here also you can see the process the the process state and the call trace you can try to
Read through the kernel code and functions one by one and see how the code flows So another is out of memory. I'm pretty sure that all of the system is half seen and out of memory error Here I have chosen this specific because specific
Message because this is from a very old kernel, and I don't have to explain much nowadays I would have to explain all Pneuma all the new quadrilateral changes all the stuff here, but you can see
It is dumping. This is the same information. Which is dumped by the SysRQ Memory in memory dump so the pattern the it would look the same of course it will have different information But the pattern is all the same you can see that
You can see the number of active pages and number of inactive pages active and inactive pages basically mean that Kernel uses a list called LRU least recently used it is used to identify whether a particular process Is being a particular page whether a page is currently used or not?
If kernel think remember think that it is active it will Be in the active list and if a kernel want to free a particular page. It will first put it to inactive Before it frees up, so you have the active and inactive pages
you have the dirty pages dirty pages of pages which are In the memory which have changed information which are not written back Right back is the pages if I remember correctly which are in in flight which has been currently written back to the system
This unstable is NFS specific stuff NFS does have this unstable tree which have Pages which need to be written back of course the free memory and slab slab is basically Kind of objects which are used
Defined predefined in the kernel so that you get contiguous So the page allocations will be contiguous, so it it usually have the dentry level information it will have It how all the K malloc Objects you can check
slab top command in your system to see slab information mapped is the map pages It's basically yeah, any pages which does have a file back backup and then page table entries It does have all the information on where the pages are
in the memory then so this is this have like a list of Pages or and we are it's trying to show you where all the pages are allocated Then you will see The different songs you see DMA normal high mum. This is a 32-bit system
That's why you see all DMA normal high mum stuff so One thing why did this? system panicked Because it does have a lot of memory. It does have almost
Free pages it does have considerable amount of memory any idea. Why would this? System face out of memory when does a system face out of memory yeah Sorry
Yeah, yeah, it is When when the system normally when the system runs out of memory right then so Yeah, and also Yeah, so the question is From me yeah, okay, so you want to repeat my question or his question?
the answer Okay, oh, yeah, okay Okay, so Let's continue. I will tell you why the system this system specifically panic because of the songs
Because each song does what are these songs DM it traditionally we have DMA songs and normal and high mum it's a bit complicated topic, but Long time back when the DMA devices can only access Some of the devices can only access the DMA so can only access up to I think
16 Yeah, 16 MB of ramps so and So for those devices DMS on was introduced and rest of then kernel mostly works in normal song then high mum is used
for mapping rest of the memory because 32-bit only have Like theoretically it can only have 4 GB of RAM It's a bit more complicated topic, but basically we have different songs like
DMA normal and high mum and in this case Each song does have Few watermarks one is free min and low now the thing is when the free mum goes below min that is when an Out of memory happens in the end that is a specific reason any out of memory happens
So when the free mum goes below low it will the kernel will very actively try to reclaim memory using your pretty flush using your Whatever the current demon kernel is running the kernel demons. It will try to reclaim memory and
until try to reclaim memory and until The free reaches high so in this case even though there was high High mum does have enough free memory the normal song Didn't had enough free memory, so this was like a historical problem with 32-bit systems
Nowadays, we don't have to worry about it. I just Use this the example from a 32-bit system so that I can explain That there is something called songs in our memory in a virtual memory, then we have Buddy alligator, but the alligator basically kind of show you show the camera
In each songs how much contiguous memory is available, so if you see if more memory is allocated in in 4 KB the chance of the System being in a system memory being highly fragmented state is high so when
the more memory is here In with larger chunks of memory that means that the memory is less fragmented, so That is one more thing so if some of the applications probably need contiguous memory
Very large contiguous memory, so when it tries to allocate Like a contiguous piece of memory it can help page allocation failures and stuff like that and Yeah, you have the swap Information then you have the process which was killed For killing a process normally a function called badness is used. It does use different
Logic like a set of logics To allocate the points to a different process in the system for example if a process is
Niced it will be given less priority to kill if if a process is a route Run if it's if the process is run by a route the chance of it is getting killed is less so the badness value is based on that and Currently in the current kernel we have
score and a DJ parameters which can kind of Tune this behavior for each PIDs and make sure for example if you don't want your my SPL to be killed you can pass a value to the home score
the prop PID home score and You can make it to zero I guess then it or minus 17 or something then it will make sure that whenever the home kill happens your favorite Process will not get killed
Okay, then we have Normal hangs Like hang is a very bad term actually I mean it's very abstract term, but here. I'm just I have just listed like hardware issues Like there can be machine check exception exceptions or error detection and correction you'd act stuff
If you find anything Like something like machine check events you can have a look at bar log and C log and Probably contact the vendor if it is a bare-metal system if it is a hyper if it is a hypervisor
If it's a VM check the hypervisor talk with the hypervisor vendor, maybe doesn't Many times happen if it happens in a Virtual machine it's probably a bug Then there can be CPU memory or IO utilization Which can cause a hang so these?
OOM or hung task these are also kind of resource allocation issues But you can also use SAR tools like SAR to identify issues Which is caused by high utilization of resources
Let's get into VM core a bit so VM core is For dumping a VM core we need a crash kernel parameter in the grub and Traditionally a long time back it was only a net dump and dis dump available net dump was dumping the memory
Over network and this dump was dumping in the local disk currently care them is capable of Dumping it everywhere so now Okay Kedem can Kedem is configured in ETC Kedem If if Kedem can only dump when there is a panic Kedem dumps the virtual
the memory information or the memory of system when there is a panic So if you specifically want to debug one of the previous issue in detail if you don't understand from the
Screenshots if you don't understand you can Intentionally panic this kernel. I'm not talking about every time you said this parameter I'm just suggesting that if you have a recurring issue if you want to avoid Like an ongoing issue if you want to have a deeper investigation you can Pass this parameter so when there is a soft lock up or an OOM or a hung task
panic the system so that Kedem will dump a VM core You can also do alt-sysrqc or previously I dumped a System Thread information so instead of that if I am doing a C. This will panic the system. I'm not going to do that now
And that will dump a VM core If by default will dump in war crash So Okay, so let's have it. What's the time?
okay, so Let's have a quick look at VM core, so I have a VM core here Which I have so this VM core. I have dumped from the system and
for Analyzing a VM core I need a command-line tool called crash crash is basically a wrapper around the GDB tool which you probably know And this VM Linux. I have extracted from
kernel hyphen debug info package Which does have the debug symbols unlike your VM Linux file in your slash boot? So you can see The gdb information it will give you the basics basic information on
The kernel the when it crashed this long time back, so it's the uptime of the system during the time of the panic, so You can see that the panic was caused by a ssrq dump. You can see the
kernel release information the hostname You can see the processes which are running in the system at the time of issue You can see the PID information you can see the task You can see the state of different processes ru is runnable processes
IN is sleeping If it is UN if there is a UN that is uninterruptible sleep If I want to see the trace of one particular process, I can just do BT PID
Another Interesting thing would be The memory state during the time of issue You can see When when the system was hung or when system was in panic state what was happening in the system?
Now so this So this was the Process which actually panicked the system this was on context You can see that it is bash because I run the command from bash. I did echo see proc ssrq trigger from bash
You can see that there's a VFS, right? in the proc File system now. What is VFS? VFS is an interface with Between different file systems and the kernel So here what I would like to so I was trying to demonstrate how to find an argument and
Which is passed so you can see the stack information like this Sorry, so this will dump the the complete stack of this call trace you can see that
after sys after sys write
It's calling VFS, right and there is stack information here From this doesn't make sense these looks invalid so Let's see. What does VFS, right have so this is the code I can just go to the source as well so you can see that it does have a struct as
Stuck stuck file as first argument, so I'm going to pass that struct File and I'm going to use that that memory
Then so you can see the structure the Whatever there is in the structure so here. You can see the UID The PID it's probably wrong all this information What's interesting here is the dentry dentry is the
Storage place of your Directory structure for instance. That's just it's one job of dentry So let's check start dentry and see what is in there
Type oh You can see the name So I was just looking for this you can see the in the dentry does have the information of
We obviously know that I run this rq trigger, so it made sense more to Dissect this particular thread this particular process you can see that this was the name of the here also the di name also we can see that name of the
File it has been accessed Another thing which you can do is probably look at the task details For example here is a yum stuff, so there is a command called task You can see the task letter information
Timestamp you can do a lot from this but all the task related information is Here as well, so Sorry
Okay, okay, yeah, so Yeah, this is basically what I was planning to cover. I think I was super fast Yeah Yeah, you any questions. Oh yeah, so yeah, let's ask we have the help we have run queue
So run queue does show you What our processes which were running in each CPU?
During the day it's not water process which were running water process which were in the runnable state and the current state current process as well and Then what else do we have? What else I am familiar with
Yeah, I think the VIM information. Yeah, of course the mount So the VFS mount information is also there, so if you want to see that Sorry, oh, sorry
So I probably have to get it from a stack Not from here. Yeah, so you can Yeah, oh, I have to check whether I think it is VFS underscore moment. I don't remember exactly but
Yeah, we have this mount so you can see you can pass through the mount information as well Yeah stuff like that
Or I PCS is there Your shared memory information any other questions Yeah, can please repeat disk so they come from
Yeah, yeah, it's
I don't understand. Okay, probably you got your answer. Yeah Okay, any other questions
If yeah
Yeah, so you're talking about a if you have a UI So the one thing I forgot to mention if you want to run any of this this RQ
You probably need to switch to one of the terminals like alt Control alt f1 and then So one one thing You can do probably is if okay, so the question is you're having
a UI interface and If you face a hand, how do you recover if it is if you're having this issue? and if it is an ongoing issue and You suspect that it is like large number of these state processes a lot of IO if you want to
Find what is happening and Or specifically if it is a load average issue if there is a solution there is a tool called hung hang hang watch Yeah, hang watch what hang watch does is you can just install the hang watch it will monitor the load average load average is a very tricky subject load average doesn't mean it is there is a problem load average is a
Calculation based on the runnable processes and uninterpretable sleep was basically our state and D state processes So if your load average is for example about 10 the hang watch will detect that and
It will automatically run a ssrq Like which was which you want to configure you can configure it in the system in the hang watch And you can do that that is one possibility another thing is that if the kernel itself have some other method or like methods to Deal with this you probably must have seen NMI watchdog
if If there is a problem, which would affect the interrupts It's non-maskable interrupts if the NMI interrupts are not incrementing our time it will kernel itself will dump An NMI watchdog error, and if you are configured that NMI watchdog should panic the kernel
It will get me to panic the kernel so the kernel also have its own mechanism but specifically if your problem is with UI I think it's You probably try to connect Through ssrq can you do that and do your commands? Yeah, you're you're saying you have a
You have a geo a system, which you use Like mostly for UI like norm or something
We visualize the first So
Are you kind of like able to refer the genome terminal? I think you can just do I think alt f2 and press R, which will Refresh the terminal we refresh the UI if it's a norm. Do you know okay? Nothing is moving
When you can after you
Recover the system you can check the logs and see what was happening in the I Mean you can find something from the logs and based on that you can continue in investigation as well I mean there is no one way to do things you just need to improvise and based on what you have
Anything else thanks for answering the questions Anything else good