CALIFORNIA STATE UNIVERSITY, LONG BEACH

Geography 458/558: Hazards and Risk Assessment

Lab: Tracking COVID-19 cases in the Calc spreadsheet

==========

COVID-19 is the natural hazard dominating the Fall 2020 semester. I wound up collecting data on cases, deaths, and acute care hospitalizations each night, starting around March 19th. At that time, the disaster was in its early but exponentially rising phase and the public health community was concerned about emergency facilities being overwhelmed with COVID-19 patients. I had a database of all (California) hospital beds by type and which beds could potentially be converted to emergency COVID use. The sense back then was that the hospitals could just barely manage the flow of patients if COVID would only double at a 7 day rate. At the time, cases were doubling every 2 or 3 days. So, I began projecting what the caseloads would be at various doubling times from the numbers of actual cases on the 19th of March.

My nightly data scrape came from https://ncov2019.live, a project by a teenage coder, Ari Schiffmann, in Washington State, who has automated the collection of data from CDC, WHO, local health departments, and another coder in Ireland (Daniel Conlon). I also collected data directly from the Los Angeles County Department of Public Health at http://publichealth.lacounty.gov/media/Coronavirus/locations.htm. On a few dates when one of these sources was down, I would collect data from other sources, such as Reuters or Johns Hopkins, but their reports come out at different times of day than I normally hunt data, so there are a few small discrepancies.

Goals for this lab

  • Give you an opportunity to do a lab on the disaster affecting all of us so drastically this semester
  • Familiarize you with free open-source software (OpenOffice or, optionally, LibreOffice)
  • Build your skills in using a spreadsheet to take the drudgery out of calculations
  • Build your skills in visualizing data trends by creating one line chart of trends in US COVID-19 cases and three pie charts of comparisons between the US and the world in facing this disaster
  • Have you interpolate doubling times at the end of the data set and to extrapolate graphed trends to come up with worst-case and optimistic case levels ten days after the data set ends.

Getting your data and the wherewithall to process them

So, what you have here is a subset of my nightly data scrapes, consisting of case data for the US for 140 nights, as well as a few points of comparison with global data on Night 140. You can get the data at https://home.csulb.edu/~rodrigue/geog458558/labs/COVIDlab.ods.

You should also download the free OpenOffice open-source office software, which you can do for free at https://www.openoffice.org. It is easy to download and install. And it's free. If you are already very proficient in Microsoft Office's Excel, you should be able to open the file and do the lab, but the directions are specific for OpenOffice Calc and you'll have to figure out the translation on your own.

Processing your data

On opening the spreadsheet, you'll see that it contains the following columns: date, days (March 19th is Day 0), confirmed (meaning known cases of COVID-19), and then numbers 2, 4, 8, 12, and 16, which are each doubling times in days. On the right are three columns having to do with the situation of the US in comparison with the world on Day 140 (August 6th).

Date runs from 19 March through 6 August. Days represents days out from 19 March, just to make your graphing easier. Confirmed is the number of known cases and we have counts all the way down to the row for Day 140, when the US had topped 5 million cases.

Now you get to calculate the number of cases we would expect at various doubling rates for each day out from 19 March. Let's start with Day 0. Put your cursor in Cell D2 (under 2 day doubling time and across from Day 0). Type in the following formula EXACTLY:

=$C$2*2^($B2/D$1)

Where:
  • $C$2 will force all the formulas to use 8,865 for the actual caseload on Day 0. That's what the dollar signs are about: $C locks on Column C and $2 locks on Row 2.
  • 2 is because we're interested in case doubling
  • ^ means raising 2 to the exponent in the parentheses, building in compounding growth
  • Inside the parentheses, $B2 locks on Column B to give us the number of days out from March 19th
  • This is divided by D$1, which locks on Row 1, where the various doubling times are used as headers.
  • So, we're figuring out expected caseloads per day as a function of original case count on March 19th (Cell C2), doubling (Cells D1 through H1), days out from March 19th, and various doubling time scenarios (Rows B2 through 142).
When you hit enter, your answer should be 8865, the same number as you started out with in Cell C2. That's because days = 0 (Cell B2) and that cancels out the exponent. Which is fine for now.

If that IS the answer you got, put your cursor back in Cell D2 and move it to the lower right corner of the cell. You should see the arrow cursor turn into a crosshair. When it does, left-click and drag across the row from D2 to H2 (under the 16 day doubling time header) and let go. You should get a line of cells occupied by 8865.

Now for the exponential magic: Highlight all the cells from D2 through H2. Move your cursor to the far rightmost corner of the highlighted strip. Again, left-click and drag all the way down to Row 142. You can see why I made a couple columns so weirdly wide. If you had a doubling time of 2 days, you would have far, far more cases than we have humans, actually more than an order of magnitude more than we have stars in the universe!!! Estimates of stars in the universe are around 1024 and the number in cell D142 is more than 1025!!! That's exponential growth for you and it always eventually hits limits in the real world and levels off. For COVID-19, that would be when it runs out of people to infect, whether through killing people (its newly invaded habitat), sickening people who recover and have some kind of immunity for some amount of time, encountering people who've been vaccinated, or simply being denied access to new victims through masking, social-distancing, and copious washing.

Look how drastically the expected case counts drop if you lengthen the doubling time. The US would have more victims than residents all the way out to an 8 day doubling time but, at 12 days doubling, it would be down to 29 million people and, at 16 days doubling, it would be down to 3 million or less than we actually do have now. Basically, this is all about lengthening the doubling time and slowing down the flow of sick people into hospital beds so that more of them have a chance at adequate care and survival.

Graphing your data

Put your cursor on the grey boxes labeled B through H. If you touch the grey boxes, the entire columns will light up. When Columns B through H are lit up, click on the graphing wizard: the tiny icon in the toolbar up top that looks like a little bar chart (in some versions of Calc, it looks like a little pie chart).

A bizarre graph comes up and a dialogue box that lets you choose a chart type. Choose a Line chart (not the default bar chart). Accept the default dots option and hit Next.

This brings up the Data Range dialogue box. Your data are in a column series, which it defaults to, but you need to click on First row as label and First column as label. You should see your column headers in the legend box on the evolving chart. Then, hit Next. You can ignore the Data Series dialogue box and click Next again.

Now you are in the Chart Elements dialogue box. You can give your chart a snappy title (er, something like USA COVID Cases by Day and Doubling Times?). The X axis should be labeled something like Days since March 19th or similar. The Y axis should also be labeled something like Number of cases or COVID-19 cases. Now, hit Finish.

Plug-ugly, isn't it? That's because the spreadsheet tried to accommodate more victims than stars in the universe from the 2-day doubling time column. While the graph still shows a grey boundary with black dots in the corners, we can edit it. If you touched somewhere else outside the graph, the graph will simply be an object in the spreadsheet. If you touch it again, you'll see green dots. That means you can't edit it, but you can move it someplace more convenient if you like. To get back in editing mode, double-click on it and you should get the grey border and black boxes.

So, let's edit. First, put a "neat line" around the perimeter by touching in an empty area near the border. Right-click and select Format Chart Area. In the dialogue box Borders tab, you can click on Style and pick Continuous from the drop-down menu, modify Color if you like, and it's a good idea to specify Width at maybe 0.02". You can also mess around with background color for the whole chart Area and Transparency effects. You can also change the color of the chart wall where the dots are (white's a good choice for that!). You can unleash your inner artist, if you are so inclined.

Now, let's deal with that stars-in-the-universe problem. While the chart is active, touch one of those bloated numbers and right-click to Format Axis. In the dialogue box, pick the Scale tab and let's override the Automatic Maximum. Unclick Automatic Maximum and write in maybe 6,000,000 or 10,000,000 to accommodate the 5,000,000+ cases the US had as of August 6th.

QUITE an improvement, eh?

You might want to format the X axis, too: Calc may stack your days into 2 staggered rows, which looks bad. To do that, click on one of the numbers, right-click to get to Format Axis, and this time click the Label tab. Type 90 in the Text orientation area or move the blue dot to the top of the dial. Just a lot more legible.

You can change the colors and the shapes of the dots that form the various trends by touching one or another of them, right-clicking, choosing Format Data Series, then exploring the Line tab. Or not.

After you've exhausted your artistic inclinations, let's examine your data visualization. You'll see five curve that start near zero and hover near zero for a while before accelerating to form a J pattern that eventually becomes nearly vertical. These are the exponential growth patterns of 2, 4, 8, 12, and 16 day doubling times. They all have basically the same shape but inflect upward at different times out from Day 0 and produce drastically different case expectations at various points in time. The actual case counts form a waving line (could be modeled with a 5th or 6th order polynomial to capture those bends).

If you like, completely optionally, you can fit actual exponential curves to the doubling time lines. To do that, while the chart is active, you'd touch one of the curves, right-click it when the dots turn green and ask to Insert Trend Line. You can edit the colors and thickness of the lines in the Line tab that comes up but you are more interested in the Options tab, where you can pick Exponential and watch what happens. OpenOffice Calc can only generate linear, logarithmic, exponential, and power curves, but LibreOffice Calc can actually fit polynomial curves of various orders (a feature I asked for and LibreOffice responded) and moving average types. Of the two communities working on essentially the same open-source project, OpenOffice is more stable and polished, while LibreOffice has adventurous additions like higher-order polynomials, but tends to be a bit unstable and awkward. I personally use both of them all the time but recommended OpenOffice for this lab because of its ease and greater stability.

Analyzing trends in your line chart

Referring to your line chart, answer the following questions, typing your answers in the Answers tab (see bottom of spreadsheet). You may need to move back and forth between the Data and Answers tabs or you could touch your line chart when it is not in editable mode (it should show green dots on the border when you do), hit Control-C to copy it, move to the Answers tab, and then hit Control-V to copy the graph into the Answers sheet to make your life easier.

  1. Roughly when (how many days out) does the US actual case curve cross below the 2 day doubling time?
  2. When does it swing below a 4 day doubling time?
  3. When does it break below the 8 day doubling time (this is about when the hospital systems begin keeping up with the flow of patients)?
  4. Roughly what would be the doubling time on Day 140 (you need to interpolate between doubling time curves)?
  5. The USA curve "flattened" for a while, beginning roughly when?
  6. When does the resumption of an almost exponential expansion occur?
  7. What was going on in roughly the month before?
  8. What's going on in the latest few days of data?
  9. Come up with a reasonable pair of caseload numbers for Day 150. One estimate should be your worst-case scenario, based on extrapolation from the resumption of the more exponential trend over the last third of the actual data.
  10. Extrapolate from that possible softening/flattening of that trend in the last week or so on the graph for your more hopeful estimate. Using a mental pencil, draw that possible trend out for 10 days.

Comparing the US to the Rest of the World

In this analysis, we'll look only at the situation on Night 140 and only a comparison of the US and the COVID-19. In Column K are figures on US and world populations, COVID cases, and COVID deaths. First, calculate the percentage the US figure is in comparison with the world. In Cell L3, type in

K3*100/K4

Then, after hitting Enter, move your cursor back into Cell L3 and hit Control- C (to copy the forumla). Move your cursor to Cell L7 and hit Control-V (to paste it there) and do the same in Cell L11. The US, with a very small percentage of the world's population, has an inequitable share of the world's COVID cases and deaths.

Show these in pie charts. To do this, highlight Cells J2 through K4 and then click on the Chart Wizard up top. This time, choose the Pie chart type. Hit Next three times and then put a suitable title so that you remember this is the population comparison. Also, put a "neat line" around the chart itself.

Do the same for Cells J6 through K8 and label that one Cases. Then, do a repeat for Cells J10 through K12.

Group the three pie charts together with your line chart of trends through time somewhere on the right side of your spreadsheet where I can find them easily,

==========
Dr. Rodrigue's Home |   Geography Home |   EMER Home |   ES&P Home

BeachBoard |   MyCSULB |   CSULB Home |   Library |   Bookstore

==========

Document maintained by Dr. Rodrigue
First placed on the web 08/07/20
Last revision: 08/15/20

==========