docx files are actually zip archives with xml in them
Let me tell you something. I cannot tell you what company, but I have been tasked with putting Excel files in git “because they are just zip archives with xml” and it is just a disaster. Everytime you save the document it will save certain parts of the xml code in arbitrary ways (like each image is in a list and the order of that list is random everytime), some metadata is re-written everytime like time of last modified and finally all the xml files are one single line. The git diffs are complete useless and noisy and just looking at the Excel file will cause git to consider it updated. So sure, you can use git to snapshot you Office documents… But just don’t.
If you are, like I once was, the poor fool who has to maintain a bunch of VBA macros… Extract them into files and source control those. Make a script to extract them and to put them back, and use git-lfs for the actual workbook if you need a template workbook.
Now pardon me, I need to add this to the agenda for my next therapy.
I will join that therapy session. This is pretty much what we did, except LFS, since it was “a requirement” to also track what they layouting of the Excel file was like.
And even extracting and inserting the code was not stable. Excel will arbitrarily change the casing of “.path” to “.Path” for no reason and add and remove whitespace between functions as it see fit. It was such a pain. We also had a hard time handling unicode strings for instance containing a degree sign. And the list goes on.
Perhaps M$ does that specifically to make it hard to work with their formats? That way, tools like libre office stay not 100% compatible, preserving their market share.
I hear ya. But to be honest, what they are doing here is fine, and doesn’t seem malicious. There is an Open Document specification and they stick to it, but the spec doesn’t enforce everything. For instance for the ordering of certain elements on the page, I bet you they store store those elements in memory in an efficient data structure where ordering doesn’t matter, so when writing out the memory to disk, the easiest for them to do is just write it out in what order it appears in their data structure.
But there are probably other cases where they are not so innocent.
What’s a good way to learn about Latex and Git. I’ve tried learning on my own but it’s very overwhelming.
Overleaf is easy to use and has tutorials for LaTeX
It is a pity that Markdown does not have the possibilities of Latex.
Typst is Markdown-ish with the possibilities of LaTeX.
Never heard of latex but I can help you with Git.
What you want to know?
Well in this thread people were saying you can set up your own local git repository? What’s a newbie friendly way of doing that. I’ve watched videos and understand that git version control system but I can’t quite seem to grasp more than that.
You can just create a local repo with
git init
, and then never push to a (non existent) remote repository. Git is decentralized, meaning that you always have a functional and complete repo when you’re working with it.Depending on your tooling, you probably have a GUI for git if you’re a noob, which can usually “initialize a git repo” for you. I use the cli/lagygit tui, so I can’t help with that.
Thank you, this clears some things up for none the less.
I will answer this, I am sick right now but will return.
I learned latex by doing my engineering homework in it. I quit using latex because I kept doing my engineering homework in it and it turns out it sucks to do
I’m doing my math homework with latex this semester, I’m probably slower but it looks good and is more maintainable.
The issue I had was if it was big enough to need maintainability it was a group project and that meant Google docs or it was math and that meant scrawled on paper. Or technical writing which is the prof that told us to try latex in the first place but I was too busy that semester to learn it
You can do maths in LaTeX and I have used Overleaf for group projects before.
Fair, but this was 10 years ago, we were engineers, and it was hard enough explaining the work I did and the work I needed other people to do to them in a way these people understood.
Also I can’t do math on computers. Like arithmetic sure, but real math, that requires actually writing it down. Idk that’s probably my old lady trait these days
Presumably you do the work on paper and then type it up. I doubt professors would accept paper work nowadays.
Okay, I have a question. I would love to write my papers in latex, but none of my colleges use it. Is there a way to reasonably collaborate with coauthors who only use Word and for whom Latex would be confusing and difficult?
You don’t. You could try overleaf or some wysiwyg editor for LaTeX, but both need some getting used to and at least a minute amount of effort. Overleaf probably has the lowest barrier of entry (0 set up required), but is a paid service.
It’s possible to selfhost overleaf if you don’t want to pay them
It depends on what sort of collaboration. For things on which I was the sole author, like my dissertation, I leveraged the miracle that is pandoc. Every email my advisor got from me was a perfectly formatted Word doc with a flawless bibliography and he never had to learn what the hell LaTeX is.
But if you have multiple contributors going back and forth, or need to keep long-lived discussions in the track changes panel, you’re better off not trying to teach others a new tool. Unless they have a genuine interest in it, in which case the WYSIWYG editors can be fun.
Markdown and pandoc are like match made in heaven for this. If you didn’t know, Markdown is plain text file, has a simple syntax for formatting (that gets carried over when you use pandoc), supports LaTeX equations and can attach metadata as yaml part on top of the file (gives custom usability when pandoc works on it) and supports citations w/ a bibliography file. And pandoc is document converter between multiple formats and can produce word files, PowerPoints, html file, latex pdfs (book, report, Beamer presentations) etc. You can also provide a template for pandoc to work with and it produces in that format. Not to mention since it’s plain text, you can apply git version control and also use make files to iteratively compile new outputs.
There is also RMarkdown (or it’s newer successor Quartro), which is same markdown pipeline but also can compute codes inside a section and attaches the result to the markdown file and does the whole pandoc thing afterwards. Think of it as like Jupyter Notebook style of literate programming with Markdown. Here’s a demonstration of its capabilities. https://youtu.be/_D-ux3MqGug
Assuming your colleagues can work with git but not LaTeX, you can set up a git repo with just markdown files and collaborate on that and have a makefile or docker container to get the final word or pdf generated. Here’s a good example of an pandoc makefile https://gist.github.com/kristopherjohnson/7466917
In Worst case scenario that they only work with word files, you can generate one from your markdown files and share with them and pull down the changes they sent you on the word document.
P.S. I assume Org-Mode can also substitute Markdown here in the pipeline. But I haven’t committed to it, so I’m not fully sure.
I learned LaTeX just so I could effectively use git in it.
I kinda want to learn LaTeX but I rarely write anything and I hate doing it so won’t have much use for it. It’s pretty neat though.
I also saw that there was a way to use LaTex to generate PowerPoint which seems extremely useful because PowerPoint is extremely annoying to use.
Yes, I also mde my. Thesis slides in LaTeX which was nice as I coukd reuse the figures.
I mean yes you can use beamer to make slides, but it is a lot less flexible than ppt/LibreOffice Present.
Git is like shit for Word documents
Just like word documents are shit for papers and theses/dissertations it turns out. The formatting alone is a nightmare.
Unzip the docx with a pre-commit hook
(This is not a serious suggestion)
.gitattributes can invoke Word on windows to diff versions, and there are plenty of open source scripts that can do it if you don’t have a copy of Word (or Windows) lying around.
But Word is like shit for papers. Use LaTeX instead.
But better for LaTeX
Why on Earth would you curse yourself with MS Office anyway, especially if writing docs is your professional responsibility?
Why not use Git+Markdown+Pandoc, have your copy, data and layout separate?
I understand that a lot of istitutions/companies impose stylistic/technical requirements for docs and publications, - still doesn’t mean you gotta stay married to the worst tooling.
Why on Earth would you curse yourself with MS Office anyway
idk it says
.docx
in OP’s imageOh sorry, I was too focused on calling out the silliness of the idea.
This is the way.
I encountered an engineering firm that did this. I wanted to do it too.
The company I worked for at the time (said engineering firm was doing subcontracting for us) was full of older business people who could never in a million years have wrapped their heads around the idea.
This is how you know they’re irrelevant. Take the time to learn or just retire.
I also met this at a contracting job. Drove me bonkers.
I wrote my thesis in Google Docs on my university account.
I also added a Makefile for mine (LaTeX), and it would add the commit hash to the front page (with an asterisk if the repository had uncommitted changes).
So, if I gave a draft to someone and got feedback, I’d know exactly which revision it was.
This is brilliant
I wrote about half of my thesis in R Markdown using Git to backup my work. It’s fantastic because you can have your plots and statistics integrated directly into your paper and formatting in Markdown is much easier than straight up latex.
Counterpoint: advisor said no.
“Just use Word, everyone else does. I have never heard of this latex thing, so must be just some trendy useless overengineered software that does Word’s job but worse. Word can track changes just fine, and you can leave comments.” proceeds to strikethrough, highlight, and inline comment everything instead of using either of those features “I want to read what you wrote, not fight technology” proceeds to email you three separate times after forgetting to attach v28 about how a graphic looks wrong because Word ate it
you can still use word with git. it’s versioning first, diffing and merging only where possible. since you probably won’t branch you won’t need the latter, though.
Missing diffs is a problem, though.
I don’t get how Microsoft owns GitHub yet hasn’t figured out any way to actually create a spec that would be git compatible for Excel, Word, and PowerPoint files yet.
Easy, they want you to buy a onedrive subscription.
Preaching to the choir. “But Box already supports ‘versioning’, why use a confusing hacker tool instead?”
oh I see, you have a shared drive. i assumed you send it around as emails.
A fine assumption given what I wrote. Unfortunately, we did both depending on what he felt like at the time. Yes, for the same doc.
While correct in the sense of word and versioning via mail being a nightmare, I really don’t think you can expect anyone to learn latex just so they can comment in your document. I would have offered to send a pdf. Shoot me.
I’m going to send you a pdf, you van email me back with the notes or comments in the PDF itself, whatever souts your fancy, and I’ll keep those notes and send you a new PDF with them.
I did this and I had no issues with any of the thesises I have submitted in my bachelors or masters.
First year calculus teacher, thank you SO much for forcing us to write submissions in latex.
Also, overleaf is a thing, this is not like my 1st year of uni, this 11 years later or so. If your fucking professor never heard of latex they are just bad at academia and shouldn’t be teaching honestly. It’s not just about the field knowledge.
That’s assuming they are competent enough to even use a PDF.
I’m going to send you a pdf, you van email me back with the notes or comments in the PDF itself, whatever souts your fancy, and I’ll keep those notes and send you a new PDF with them.
I do this, but from Word.
I learned Latex for my master thesis. Never used it again afterwards, except for my resumé.
Had to write a paper in college with 100 citations.
We used zotero for citation management, and it would dump a bibtex file on demand.
The paper was written in markdown, stored in git, and rendered through pandoc. We would cite a paper with parentheses and something resembling an id, like (lewis).
We gave pandoc a “citation style definition”, and it took care of everything. Every citation was perfectly formatted. The bibliography was perfectly formatted. Inline references were perfect. Numbering was perfect. All the metadata was ripped from pdfs automatically. It was downright magical.
yep, markdown is a great alternative to LaTeX if you don’t need fancy layouts or anything special
Markdown + pandoc means it goes through an intermediary latex template on the way to pdf land - which means your markdown can be a bastardized mix of markdown, html, latex commands, and sometimes more ;)
This is what I (a non coder who only knows git “download the Yuzu repo before they nuke it” and git “give me all the updates”) want to do when I get to write a paper. How much git did you have to learn to do this?
This is just basic make changes to file, git add and commit workflow. Other features of git like branching can be leveraged for greater control but are optional. What makes it magical is 3 seperate systems working together with such symphony namely git, Zotero and pandoc. Zotero is citation manager that you can use store scientific articles, papers, thesis etc. and it can produce a bibliography file and pandoc can reference those along with the citations in the make file to create a clean typesetted Word or LaTeX pdf with precise numbering, table of contents, citations and bibliography with correct format without you needing to edit anything.
Exactly my workflow, but I used R Markdown!
I absolutely love R markdown! Being able to iterate on your analysis and report at the same time is fantastic
I recently read a tutorial titled: “how to annoy your collaborators: a git CI pipeline for LaTeX” ;)
git checkout -b final_version_revised2_REALLYFINALTHISTIME git commit -am “holy fuck I hope this really is the last edit” git push
deleted by creator
Fourth panel from Mark Pilgrim:
- Writing a programming book that typesets your sample code into the book and also runs it to update the sample output shown in the book.